Introduction
OLLAMA Overview
Installing OLLAMA
Supported Models
Running the models
Searching for my Subject Matter Experts — Round 1
Searching for my Subject Matter Experts — Round 2
Final Results
Conclusion
Version History

Published November 10, 2025 © Apache-2.0

OLLAMA - Getting Started Guide for AMD GPUs

Tutorial on how to get started with OLLAMA and AMD GPUs.

BeginnerFull instructions provided2 hours650

Things used in this project

Hardware components

AMD Radeon™ Pro W7900 GPU

Story

Introduction

This article is the second in a series exploring how to control robots with LLMs:

Part 1 — The Genius Taxi Driver

Part 2— OLLAMA — Getting Started Guide for AMD GPUs

Part 3 — LANGCHAIN — Getting Started Guide for AMD GPUs

Part 4 — Implementing Agentic AI in ROS2 for Robotics Control

Part 5 — Evaluating Tool Awareness of LLMs for robotic control

OLLAMA Overview

First and foremost, OLLAMA is a python package that allows users to run open-source models locally, instead of in the cloud.

It provides the runtime to execute the models locally, on a setup with one or multiple GPUs.

It is very easy to install, and seamlessly supports NVIDIA and AMD GPUs.

Note that my system has the following two GPUs:

AMD Radeon Pro W7900, with 48GB VRAM
NVIDIA T400, with 2GB VRAM

Normally, I wouldn’t mention the smaller NVIDIA GPU in my system, except that it allows me to run a 72 billion parameter model, in a distributed fashion on both GPUs.

Installing OLLAMA

Once the AMD GPU utilities are installed (amdgpu driver, and ROCm), OLLAMA is one of the easiest packages I have installed.

https://ollama.com/download/linux

As described in the previous quick start guide, the following line installs OLLAMA on a linux machine:

curl -fsSL https://ollama.com/install.sh | sh

Supported Models

Supported models are listed on the following web page:

https://ollama.com/models

OLLAMA — Supported Models with Tools support (📷 ollama.com)

Most models have various variants, as well as a default “latest” variant. These variations can be determined by clicking on the specific model. As an example, if we click on GPT-OSS (OpenAI’s open-source model):

Downloading models is as easy as executing the following command:

ollama pull {model}:{variant}

For the 20B version of the GPT-OSS model, this would be:

ollama pull gpt-oss:20b

For the purpose of my “robot control” exploration, I have downloaded the following models:

My criteria for this choice of models was the following:

support for Tools
less than 48 billion parameters (to fit in the 48GB of my AMD GPU Radeon Pro W7900)

After realizing that the models have less than 8bits per parameter, and that NVTOP reports that my AMD GPU actually has 45GB of VRAM available, I re-assessed by criteria to the following:

filesize of less than 45GB

This allowed me to select the following two additional models in the 70 billion parameter range:

deepseek-r1:70b — successfully runs on Radeon Pro W7900 (48GB)
qwen2.5:72b — successfully runs on Radeon Pro W7900 (48GB) + NVIDIA T400 (2GB)

OLLAMA — qwen2.5:72b running on AMD Radeon Pro W7900 and NVIDIA T400 (📷: AlbertaBeef)

Running the models

In order to run an instance of a model, I used two terminals:

In the first terminal, I launch the OLLAMA server, as follows:

ollama serve

In the second terminal, I launch the OLLAMA instance, as follows:

ollama run {model}:{variant}

If you need to query which models you have downloaded, you can query the ollama server with:

ollama list

ollama list (📷: AlbertaBeef)

Searching for my Subject Matter Experts — Round 1

I compared the models downloaded previously using two prompts each:

Who are you ?
Draw a 5-point star using turtlesim.

The first prompt is only used to load the model in GPU memory.

The second prompt is the real test, and serves to identify which models have knowledge about robots, and more specifically ROS2.

I then took the verbose output and did some keyword searches to determine if the models were producing acceptable results.

Some of the keywords I monitored were:

“import turtlesim” => indicating a hallucinated python package
“Twist”, “cmd_vel”, “linear.x”, “angular.z” => indicates a correct response
“ROS1” and “rospy” => indicates a solution for ROS1
“ROS2” and “rclpy” => indicates a solution for ROS2

Here are the results for two of the models:

Output Parsing for LLAMA3.2:1b and QWEN2.5:72B models (📷: AlbertaBeef)

If we analyze the previous two output parsing examples, we can clearly conclude that:

the 1B parameter LLAMA3.2 model has hallucinated a response using a non-existant “turtlesim” package.
the 72B parameter QWEN2.5 model has correctly responded for ROS1

In fact, all the models that responded correctly, provided a solution for ROS1, instead of ROS2.

Round 1 Results (📷: AlbertaBeef)

For this reason, I had to make my second prompt more explicit.

Searching for my Subject Matter Experts — Round 2

After the Round 1 results, I performed the same tests, and adjusted the second prompt:

Who are you ?
Draw a 5-point star using turtlesim using ROS2.

Here are the results for two of the models:

Output Parsing for LLAMA3.2:3b and GPT-OSS:20B models (📷: AlbertaBeef)

If we analyze the previous two output parsing examples, we can clearly conclude that:

the 3B parameter LLAMA3.2 model has hallucinated a response using a non-existant “turtlesim” package.
the 20B parameter GPT-OSS model has correctly responded for ROS2

With the exception of DeepSeek and Mistral, all the correct responses were for ROS2.

Round 2 Results (📷: AlbertaBeef)

Final Results

If we compare the results from both rounds, we see the following candiates as good candidates to be used as subject matter experts (SME) for our robot control exploration.

Combined Results (📷: AlbertaBeef)

We can also appreciate the benchmarks of these models with respect to the Radeon Pro W7900 GPU from AMD.

70B parameter models => ~12 tokens/sec
32B parameter models => ~20 tokens/sec
7B-8B parameter models => ~56–68 tokens/sec
3B parameter models => ~100 tokens/sec
1B parameter models => ~164 tokens/sec

Rate (tokens/sec) versus Parameters (B) — HP Z4 G4 + AMD Radeon Pro W7900 (📷: AlbertaBeef)

Conclusion

In this article, we provided an overview of OLLAMA, how to install it on a linux machine equipped with an AMD Radeon Pro W7900 GPU, and measured some benchmarks for various model sizes.

We also performed our first rounds of validation of open-source models, for use with our intended ROS2 robotics application.

In the next article, we will look at another open-source package called LANGCHAIN. This layer will provide us additional capabilities, including defining tools for use with our robotics application.

Version History

2025/11/10 - Initial Version

Mario Bergeron

56 projects • 301 followers

Mario Bergeron is a Technical Marketing Engineer working at Tria, specializing in embedded vision and machine learning.

OLLAMA - Getting Started Guide for AMD GPUs

Things used in this project

Hardware components

Story

Introduction

OLLAMA Overview

Installing OLLAMA

Supported Models

Running the models

Searching for my Subject Matter Experts — Round 1

Searching for my Subject Matter Experts — Round 2

Final Results

Conclusion

Version History

Credits

Mario Bergeron

Comments

Embed the widget on your own site

OLLAMA - Getting Started Guide for AMD GPUs

OLLAMA - Getting Started Guide for AMD GPUs

Things used in this project

Hardware components

Story

Introduction

OLLAMA Overview

Installing OLLAMA

Supported Models

Running the models

Searching for my Subject Matter Experts — Round 1

Searching for my Subject Matter Experts — Round 2

Final Results

Conclusion

Version History

Credits

Mario Bergeron

Comments

Related channels and tags