This article is the second in a series exploring how to control robots with LLMs:
Part 1 — The Genius Taxi Driver
Part 2— OLLAMA — Getting Started Guide for AMD GPUs
Part 3 — LANGCHAIN — Getting Started Guide for AMD GPUs
Part 4 — Implementing Agentic AI in ROS2 for Robotics Control
Part 5 — Evaluating Tool Awareness of LLMs for robotic control
OLLAMA OverviewFirst and foremost, OLLAMA is a python package that allows users to run open-source models locally, instead of in the cloud.
It provides the runtime to execute the models locally, on a setup with one or multiple GPUs.
It is very easy to install, and seamlessly supports NVIDIA and AMD GPUs.
Note that my system has the following two GPUs:
- AMD Radeon Pro W7900, with 48GB VRAM
- NVIDIA T400, with 2GB VRAM
Normally, I wouldn’t mention the smaller NVIDIA GPU in my system, except that it allows me to run a 72 billion parameter model, in a distributed fashion on both GPUs.
Installing OLLAMAOnce the AMD GPU utilities are installed (amdgpu driver, and ROCm), OLLAMA is one of the easiest packages I have installed.
As described in the previous quick start guide, the following line installs OLLAMA on a linux machine:
- curl -fsSL https://ollama.com/install.sh | sh
Supported models are listed on the following web page:
Most models have various variants, as well as a default “latest” variant. These variations can be determined by clicking on the specific model. As an example, if we click on GPT-OSS (OpenAI’s open-source model):
Downloading models is as easy as executing the following command:
- ollama pull {model}:{variant}
For the 20B version of the GPT-OSS model, this would be:
- ollama pull gpt-oss:20b
For the purpose of my “robot control” exploration, I have downloaded the following models:
My criteria for this choice of models was the following:
- support for Tools
- less than 48 billion parameters (to fit in the 48GB of my AMD GPU Radeon Pro W7900)
After realizing that the models have less than 8bits per parameter, and that NVTOP reports that my AMD GPU actually has 45GB of VRAM available, I re-assessed by criteria to the following:
- filesize of less than 45GB
This allowed me to select the following two additional models in the 70 billion parameter range:
- deepseek-r1:70b — successfully runs on Radeon Pro W7900 (48GB)
- qwen2.5:72b — successfully runs on Radeon Pro W7900 (48GB) + NVIDIA T400 (2GB)
In order to run an instance of a model, I used two terminals:
In the first terminal, I launch the OLLAMA server, as follows:
- ollama serve
In the second terminal, I launch the OLLAMA instance, as follows:
- ollama run {model}:{variant}
If you need to query which models you have downloaded, you can query the ollama server with:
- ollama list
I compared the models downloaded previously using two prompts each:
- Who are you ?
- Draw a 5-point star using turtlesim.
The first prompt is only used to load the model in GPU memory.
The second prompt is the real test, and serves to identify which models have knowledge about robots, and more specifically ROS2.
I then took the verbose output and did some keyword searches to determine if the models were producing acceptable results.
Some of the keywords I monitored were:
- “import turtlesim” => indicating a hallucinated python package
- “Twist”, “cmd_vel”, “linear.x”, “angular.z” => indicates a correct response
- “ROS1” and “rospy” => indicates a solution for ROS1
- “ROS2” and “rclpy” => indicates a solution for ROS2
Here are the results for two of the models:
If we analyze the previous two output parsing examples, we can clearly conclude that:
- the 1B parameter LLAMA3.2 model has hallucinated a response using a non-existant “turtlesim” package.
- the 72B parameter QWEN2.5 model has correctly responded for ROS1
In fact, all the models that responded correctly, provided a solution for ROS1, instead of ROS2.
For this reason, I had to make my second prompt more explicit.
Searching for my Subject Matter Experts — Round 2After the Round 1 results, I performed the same tests, and adjusted the second prompt:
- Who are you ?
- Draw a 5-point star using turtlesim using ROS2.
Here are the results for two of the models:
If we analyze the previous two output parsing examples, we can clearly conclude that:
- the 3B parameter LLAMA3.2 model has hallucinated a response using a non-existant “turtlesim” package.
- the 20B parameter GPT-OSS model has correctly responded for ROS2
With the exception of DeepSeek and Mistral, all the correct responses were for ROS2.
If we compare the results from both rounds, we see the following candiates as good candidates to be used as subject matter experts (SME) for our robot control exploration.
We can also appreciate the benchmarks of these models with respect to the Radeon Pro W7900 GPU from AMD.
- 70B parameter models => ~12 tokens/sec
- 32B parameter models => ~20 tokens/sec
- 7B-8B parameter models => ~56–68 tokens/sec
- 3B parameter models => ~100 tokens/sec
- 1B parameter models => ~164 tokens/sec
In this article, we provided an overview of OLLAMA, how to install it on a linux machine equipped with an AMD Radeon Pro W7900 GPU, and measured some benchmarks for various model sizes.
We also performed our first rounds of validation of open-source models, for use with our intended ROS2 robotics application.
In the next article, we will look at another open-source package called LANGCHAIN. This layer will provide us additional capabilities, including defining tools for use with our robotics application.
Version History2025/11/10 - Initial Version





Comments