The biggest problem faced by most people in the face of large language models is the lack of sufficient computing resources to use these powerful AI applications.
Microsoft AI Research published the paper "Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone" in April 2024, and also open sourced a very excellent Phi3 small language model SLM, which can be executed on mobile phones.
So we tried to configure a lower-end NVIDIA Jetson edge intelligent device, and through the Ollama model manager and Open webui interactive interface, we successfully ran the Phi3:8b model on a device with an ARM processor with only 6 cores/1.4GHz clock speed and 8G video memory, and built an intelligent conversational robot AI ChatBot for multi-person interactive use, achieving the effect shown in the video below.
The main operations in the video are as follows:
- Let the dialogue robot introduce itself
- Asking for information about the population of China
- Write a Python code that can draw a regular pentagon
- Read an online article and write a 500-word summary in Chinese
- Solve the problem of chicken and rabbit in the same cage
The entire process is quite smooth without any stuttering.
If you are satisfied with the video effect, you can follow the installation tutorial provided by us to build a dedicated conversational robot for yourself.
Model download and managementHere, ollama is used as the manager of the language model, responsible for downloading and managing the storage of the model. Since ollama also has the function of terminal conversation, it can also be used as a local
Install Ollama model managementThe installation of this part is very simple,Enter the Ollama official website download area as shown below:
Click the penguin icon in the middle to select the Linux platform, and then click the "Copy" icon in the lower right corner to copy this line of execution code
Paste the instruction into the Jetson Xavier NX instruction terminal to execute, as shown below:
The installation is now complete, and ollama is now starting in the background as a service.
Download the model using OllamaWe can use Ollama to download large language models, which has the following three benefits:
- Simple and fast: just execute ollama pull <model name: version> to download
- Support for resuming the transfer of files after a breakpoint
- No special internet access is required
Of course, these benefits are only limited to the models supported by Ollama. You can click on the Models button in the image below to access the support list
Fortunately, well-known models such as Microsoft Phi3, Meta Llama3, Google Gemma, Mistral, and qwen Qianwen in China are all included in the list.As for models not on the support list, you can download them from HuggingFace or other places and then process them using the ollama model conversion command. However, we will not cover this part here.Now execute the following code to download the phi3 model.
ollama pull phi3 (下载 latest 版本)
Since the latest version of the phi3 model supported by ollama is 3.8b, if you enter "ollama pull phi3", it will download the latest version.
As for the Llama3 model, it supports both 8b and 70b versions, with 8b being the default. If you execute "ollama pull llama3", it will download the 8b model. If you want to download the 70b model, you need to execute
ollama pull phi3 (下载 latest 版本)
Since the latest version of the phi3 model supported by Ollama is 3.8b, if you enter "ollama p..."
Check downloaded modelsYou can execute the following command at any time to display a list of downloaded models.
ollama list
Run downloaded modelsPlease use the "ollama run <model name:version>" command to run the model on this machine. For example, to run the phi3 model:
ollama run phi3
Now you can have a conversation with the Phi3 model on the text terminal of the Jetson Xavier NX, but so far it can only provide one-on-one conversations, which is not very practical. If you want to use one device to provide simultaneous conversation functionality for multiple users, you will need the assistance of the Open WebUI tool.
Installing the Open Wehui InterfaceThis application provides a relatively complete docker installation image for the x86 platform, but it does not provide support for the Jetson ARM platform. Therefore, manual installation is required for this part. For complete instructions, please refer to the "How to Install Without Docker" section on the official website.
Since manual installation requires Python >= 3.11, and none of the Jetpack 5.12/5.13 or 6.0DP versions of Jetson meet this requirement, MiniConda is used to solve the Python version problem.
Downloading and Installing MiniCondaThere are many tutorials online about installing MiniConda, so I won't go into too much detail here. The main point is to download the version that supports the aarch architecture. Here, we recommend the version that supports Python 3.11 (click the link), or execute the following command on the Jetson device:
wget https://repo.anaconda.com/miniconda/Miniconda3-py311_24.3.0-0-Linux-aarch64.sh
Please proceed with the installation and environment creation yourself.
Manual Installation of Open WehuiFollow the installation steps below to proceed:
# 下载代码仓git clone https://github.com/open-webui/open-webui.gitcd open-webui/
# 复制所需要的 env 文件cp -RPp .env.example .env
# 使用 node 创建前端应用sudo apt update && sudo apt install npm -ynpm i && npm run build
# 创建后端服务cd ./backendpip install -r requirements.txt -U
This completes the installation of the interactive operating environment for Open Webui, and now you are ready to start the chatbot.
Starting the ChatbotAfter completing the above work, you can start the chatbot. Before starting, we remind you to check the following environment:
Python version: If you restart the device, remember to first activate the virtual environment with "conda activate <virtual environment>", and then run "python -V" to confirm the version.
Whether the ollama service is running: First, execute "ollama list" to check if it is working properly. If not, please execute "ollama serve" to start the service, and then open another terminal and execute "ollama list" again.
Once everything is ready, you can execute the following command to start the Open Webui application:
cd <路径>/open-webui/backendbash start.sh
When the following screen appears, it indicates normal startup.
Next, you can enter the following operation screen by entering <IP_OF_JETSON>:8080 in the browser on a remote device (such as a laptop).
Since OpenWebui has management functions and allows multiple users to log in and use it, you need to register a user as an administrator when you log in for the first time. You can provide any email and password. The system will not check for correctness, it is only for local management purposes.
After entering the main screen (as shown below), you can click the settings in the upper right corner to modify the displayed language. The dropdown box under "Select a model" in the middle will provide models downloaded and managed by Ollama. Please select the one you want.
Finally, it's worth mentioning that Open Webui has its own large model execution system, so there is no need to use "ollama run" on the Jetson first. This allows you to freely choose the large language model to execute within Open Webui.
Now, you can start chatting with the language model chatbot built on the Jetson device through the intranet, just like in the video at the beginning.
Comments