Large language models (LLMs) like DeepSeek-R1 are gradually becoming a core component of edge intelligence applications. Running them directly on Jetson Orin offers key benefits:
- Fully offline operation
- Low-latency response
- Enhanced data privacy
This guide including:
- Environment preparation
- Installing Ollama
- Running the DeepSeek-R1
- (Optional) Using Open WebUI for a web-based interface
- Ubuntu 20.04 / 22.04(JetPack 5.1.1+ recommended)
- NVIDIA CUDA Toolkit and drivers (included with JetPack)
- Docker (optional, for containerized deployment)
⚙️ Tip: Use jetson_clocksand check nvpmodel to enable maximum performance mode for the best inference results.
Open your terminal or command prompt and run the following command to install the NativeScript CLI.
curl -fsSL https://ollama.com/install.sh | sh- Installs the Ollama service and CLI tools.
- Automatically handle dependencies and configure the background service.
sudo docker run --runtime=nvidia --rm --network=host \
-v ~/ollama:/ollama \
-e OLLAMA_MODELS=/ollama \
dustynv/ollama:r36.4.0🧩 The Docker version is maintained by the NVIDIA community (dustynv) and optimized for Jetson.
Verify Ollama is Running (refer to the code below) ss -tuln | grep 11434Expected output:
LISTEN 0 128 127.0.0.1:11434 ...If port 11434 is listening, the Ollama service has started successfully.
To run the 1.5B parameter version:
ollama run deepseek-r1:1.5b- Ollama will automatically download the model if it is not cached locally.
- Starts an interactive conversation in the command line.
💡 Depending on your hardware capability, you can replace 1.5b with 8b、14b , etc.
Open WebUI provides a user-friendly browser-based chat interface.
sudo docker run -d --network=host \
-v ${HOME}/open-webui:/app/backend/data \
-e OLLAMA_BASE_URL=http://127.0.0.1:11434 \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:mainAccess the WebUI Visit your browser with:
http://localhost:3000/- You can interact with the DeepSeek-R1 model graphically
- View conversation history, and review model responses directly in the browser.
📉 The initial model load may take about 30 seconds to 1 minute; subsequent runs will be faster thanks to caching
7. Troubleshooting ~/ollama/ # Model cache directory
~/open-webui/ # WebUI persistent dataReferences 






Comments