Required Hardware
Hardware architecture
The Router AI Agent: Intelligent Query Orchestration
Part 1: Building llama.cpp on the Workstation (RTX 5090) and Jetson AGX Thor
Part 2: Setting Up Secure Networking with Tailscale
Part 3: Deploying n8n for Workflow Automation
Part 4: Simple Demo- Chat Trigger + Routing Agent
Part 5: Real-World Demo: Telegram AI Assistant
📚 References & Further Reading

Published September 22, 2025 © GPL3+

Building a local Router AI-Agent with n8n and llama.cpp

This guide will walk you through the entire process of setting up and running a local Router AI-Agent on your local machine using n8n.

AdvancedFull instructions provided3 hours2,062

Building a local Router AI-Agent with n8n and llama.cpp

Things used in this project

Hardware components

NVIDIA Jetson Thor

Story

In this project, we’ll design and implement an AI agent architecture that leverages multiple large language models (LLMs) distributed across different hardware, intelligently routing user queries to the most appropriate model based on complexity, latency, and computational requirements. At the heart of this system is a router agent, which acts as a decision-making layer powered by its own LLM inference to interpret user intent and delegate tasks efficiently.

We’ll be using the powerful NVIDIA Jetson AGX Thor Developer Kit as our primary edge AI platform, complemented by a machine with NVIDIA GeForce RTX 5090 GPU for auxiliary inference workloads. This setup enables us to balance performance, and responsiveness by dynamically assigning queries to either a lightweight or heavyweight LLM based on prompt understanding. This system is orchestrated by n8n, secured via Tailscale, and powered by llama.cpp for the inference server engine.

Required Hardware

The system is built around the NVIDIA Jetson AGX Thor Developer Kit - a high-performance single-board computer optimized for edge AI applications. With 128GB of unified VRAM, it’s uniquely capable of running massive, state-of-the-art open-source models like OpenIA's GPT-OSS-120B and GPT-OSS-20B, which are simply too large to run on a typical standalone PC with limited VRAM.

Here is a photo of the NVIDIA Jetson AGX Thor developer kit:

Photo: NVIDIA Jetson AGX Thor Developer Kit

The system leverages a powerful combination of edge and desktop computing hardware. A NVIDIA GeForce RTX 5090 GPU is also used for running a more efficient, quantized model, ensuring a fast response time.

Photo: NVIDIA GeForce RTX 5090 GPU

This dual-hardware approach ensures the system is both powerful and responsive, addressing the limitations of relying on a single piece of hardware.

Without further ado, let’s get started!

Hardware architecture

Here's a brief overview of the scenario I would like to create:

The n8n workflow automation tool, which is the brain of the operation, managing all logic and communication.

The system consists of two main physical nodes connected via a Local Area Network (LAN) and securely exposed to the internet through a Tailscale funnel. A Funnel securely exposes the n8n web service (running on the 5090 RTX workstation) to the public Internet. It provides a secure HTTPS URL that allows services like Telegram or a web frontend to send user queries into our private tailnet without complex port forwarding or security risks. The models are served using llama.cpp, optimized for the ARM-based architecture of the Jetson platform.

The Router AI Agent: Intelligent Query Orchestration

The intelligent decision-making at the heart of this system is performed by the Router Agent. We deploy a lightweight, quantization-aware trained (QAT) version of Google’s Gemma model - gemma-3-4b-it-qat. Despite its smaller size, it is instruction-tuned and highly capable of intent classification and routing logic.

The Router Agent's decision-making process for classifying and delegating user queries.

Simple Queries: If the query is determined to be straightforward (e.g., factual questions, simple definitions, brief summarizations), the Router directs it to the efficient gemma-3-12b-it-qat model running on the RTX 5090.

Complex Queries: If the query is identified as complex (e.g., requiring multi-step reasoning, in-depth analysis, creative generation, or nuanced discussion), the Router directs it to the powerful openai_gpt-oss-120b model running on the Jetson AGX Thor.

Part 1: Building llama.cpp on the Workstation (RTX 5090) and Jetson AGX Thor

We'll use llama.cpp to serve our models efficiently. Follow these steps to build it with CUDA support. Clone the llama.cpp repository from GitHub. This repository contains all the necessary source code and build scripts:

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

Once inside the llama.cpp directory, you'll need to configure the build using CMake. The following cmake command sets up the build environment:

cmake -B build -DGGML_CUDA=ON

After configuring the build with CMake, you can compile the project using the following command:

cmake --build build --config Release -j $(nproc)

The server binary will be located at ./build/bin/llama-server. This binary will serve as the backbone for all our LLMs.

Part 2: Setting Up Secure Networking with Tailscale

We need to expose our n8n automation server securely, without opening firewall holes or managing TLS certificates.

First, you'll need to install the Tailscale client on your machine. Run the following command in your terminal:

curl -fsSL https://tailscale.com/install.sh | sh

After the installation is complete, run the following command to connect your machine to your Tailscale network:

sudo tailscale up

You should see a message confirming the login was successful. At this point, your machine is now part of your private Tailscale network, or tailnet.

Then, enable funnel

sudo tailscale funnel 5678

You’ll get a public URL like:

https://your-machine-name.tailnet-hash.ts.net

This is how Telegram (or any external service) will securely reach your local n8n instance - no port forwarding, no dynamic DNS, no stress.

Part 3: Deploying n8n for Workflow Automation

n8n is a powerful, open-source workflow automation tool that will act as the brain of your AI agent, connecting Telegram, LLMs, and vector databases.

Create a project directory to store your n8n environment configuration and Docker Compose files and navigate inside:

mkdir n8n-compose 
cd n8n-compose

Create local files directory

mkdir local-files

Find your machine’s LAN IP (pick one interface/IP you use at home):

hostname -I

Create compose.yaml:

services:
  n8n:
    image: docker.n8n.io/n8nio/n8n:latest
    restart: always
    ports:
      - "5678:5678"              # reachable at http://<LAN_IP>:5678
    environment:
      - N8N_HOST=1P_ADDRESSS     # put your LAN IP or hostname here
      - N8N_PORT=5678
      - N8N_PROTOCOL=http
      - WEBHOOK_URL=https://your-machine-name.tailnet-hash.ts.net # tailscale funnel
      - N8N_RUNNERS_ENABLED=true
      - N8N_SECURE_COOKIE=false
      - GENERIC_TIMEZONE=Asia/Almaty
      - TZ=Asia/Almaty
      - N8N_ENFORCE_SETTINGS_FILE_PERMISSIONS=true
    volumes:
      - n8n_data:/home/node/.n8n
      - ./local-files:/files
volumes:
  n8n_data:

Start n8n by typing:

sudo docker compose up -d

Open your browser and go to:

http://<worksation-ip>:5678

To create a Funnel, use the tailscale funnel command and pass the target port you want to share.

sudo tailscale funnel 5678

You’ll see output like:

Available on the internet:

https://your-machine-name.tailnet-hash.ts.net
|-- proxy http://127.0.0.1:5678

Press Ctrl+C to exit.

On the Workstation (RTX 5090):

Then, start the Router Agent LLM (Gemma 3 4B QAT) using llama.cpp server:

./build/bin/llama-server -hf ggml-org/gemma-3-4b-it-qat-GGUF -c 0 -fa on --jinja --host 0.0.0.0 --port 9000

Start the Lightweight LLM (Gemma 3 12B QAT):

./build/bin/llama-server -hf ggml-org/gemma-3-12b-it-qat-GGUF -c 0 -fa on --jinja --host 0.0.0.0 --port 10000

On the Jetson AGX Thor:

Start the Heavyweight LLM (GPT-OSS 120B):

./build/bin/llama-server -hf bartowski/openai_gpt-oss-120b-GGUF --jinja   --host 0.0.0.0 --port 9000

Pro Tip: Use different ports to avoid conflicts.

Part 4: Simple Demo- Chat Trigger + Routing Agent

Here is a simple example of an n8n workflow where a user can interact with LLMs via an n8n chat trigger. A routing agent is a small and fast LLM (Gemma 3 4B QAT), analyzes incoming user queries and makes a decision.

This agent’s only job is to analyze the query and output structured JSON like:

{
  "prompt": "What’s the capital of France?",
  "model": "gemma-3-12b-it-qat-GGUF"
}

Or, for a more complex question:

{
  "prompt": "Compare transformer architectures in LLMs and explain trade-offs for edge deployment.",
  "model": "openai_gpt-oss-120b"
}

Below is a screenshot of the sample output.

Sample Output from Routing Agent

Let's look at a more complex demo app.

Part 5: Real-World Demo: Telegram AI Assistant

Now let’s scale this into a real-world application. Users can interact naturally via Telegram messenger:

The magic happens in n8n’s workflow. Here’s how it works:

Telegram Trigger: Listens for user messages.
Routing Agent (Gemma 4B QAT): Receives the raw query. Its only job is to output structured JSON:

{
"prompt": "user's original prompt",
"model": "gemma-3-12b-it-qat-GGUF" // or "openai_gpt-oss-120b"
}

Switch Node: Routes execution based on the model field.
AI Agent Nodes: Each connected to its respective LLM endpoint (via OpenAI-compatible API).
Memory & Tools: Each agent has its own memory buffer and optional tools (calculator, Wikipedia, web search).
Telegram Response: Sends the final answer back to the user.

Import the workflow JSON (provided below) into n8n. This workflow:

{
  "nodes": [
    {
      "parameters": {
        "schemaType": "manual",
        "inputSchema": "{\n\t\"type\": \"object\",\n\t\"properties\": {\n\t\t\"prompt\": {\n\t\t\t\"type\": \"string\"\n\t\t},\n\t\t\"model\": {\n\t\t\t\"type\": \"string\"\n\t\t}\n\t}\n}"
      },
      "id": "cc1002dd-c4de-4bac-96d5-97e849ae14d3",
      "name": "Structured Output Parser",
      "type": "@n8n/n8n-nodes-langchain.outputParserStructured",
      "position": [
        -272,
        304
      ],
      "typeVersion": 1.2
    },
    {
      "parameters": {},
      "id": "b23223ba-37a7-4efd-b99b-137c47cd624b",
      "name": "Think",
      "type": "@n8n/n8n-nodes-langchain.toolThink",
      "position": [
        368,
        400
      ],
      "typeVersion": 1
    },
    {
      "parameters": {},
      "id": "84061271-0779-4289-9c14-d365f0a8ea9a",
      "name": "Calculator",
      "type": "@n8n/n8n-nodes-langchain.toolCalculator",
      "position": [
        496,
        400
      ],
      "typeVersion": 1
    },
    {
      "parameters": {
        "sessionIdType": "customKey",
        "sessionKey": "=chat_with_{{ $('Listen for incoming events').first().json.message.chat.id }}"
      },
      "id": "3bcc650f-0ba0-4ef9-8817-559c1a529846",
      "name": "Simple Memory1",
      "type": "@n8n/n8n-nodes-langchain.memoryBufferWindow",
      "position": [
        256,
        400
      ],
      "typeVersion": 1.3
    },
    {
      "parameters": {
        "promptType": "define",
        "text": "={{ $json.output.prompt }}",
        "options": {
          "systemMessage": "=You have access to a web_search tool that allows you to browse the internet for up-to-date information. Here's how you should operate:\n\n1. Website Information:\nFamiliarize yourself with this information about the website you're assisting. Use this as context for user interactions.\n\n2. Web Search Tool:\nYou have access to a web_search tool that can browse the internet. To use it, write the variable {web_search_question}. The tool will return relevant search results. Set the variable {model} to {{ $json.output.model }}.\n\n3. Handling User Queries:\nWhen a user asks a question, follow these steps:\na) Analyze the query to determine if it's related to the website or requires external information.\nb) If the query is about the website, use the provided website information to answer.\nc) If external information is needed, use the web_search tool to find relevant data.\n\n4. Using web_search:\n- Use web_search for factual, current information that isn't provided in the website info.\n- Formulate clear, concise search queries.\n- If the first search doesn't yield useful results, refine your query and search again.\n- Limit searches to a maximum of three per user query to maintain efficiency.\n\n5. Using Think:\n- Using Think tool to think about something. It will not obtain new information or change the database, but just append the thought to the log. Use it when complex reasoning or some cache memory is needed.\n\n6. Formulating Responses:\n- Begin with information from the website if relevant.\n- Incorporate web search results to provide up-to-date, accurate information.\n- Summarize findings concisely and coherently.\n- If you're unsure or can't find reliable information, be honest about limitations.\n\n7. Ethical Considerations:\n- Respect user privacy. Don't ask for or store personal information.\n- Provide factual information. Avoid speculation or unverified claims.\n- If asked about controversial topics, strive for a balanced, neutral response.\n- Don't engage in or encourage illegal activities.\n\n8. Output Format:\nDo not include your thought process, web searches, or any other tags in the final output.\n"
        }
      },
      "id": "73f26cd6-8e3a-4987-b3f0-71b12e29a2a0",
      "name": "AI Agent1",
      "type": "@n8n/n8n-nodes-langchain.agent",
      "position": [
        384,
        224
      ],
      "typeVersion": 1.9
    },
    {
      "parameters": {},
      "type": "@n8n/n8n-nodes-langchain.toolWikipedia",
      "typeVersion": 1,
      "position": [
        624,
        400
      ],
      "id": "f7d637b3-2542-40d2-b003-351e52ecf385",
      "name": "Wikipedia1"
    },
    {
      "parameters": {
        "promptType": "define",
        "text": "={{ $json.output.prompt }}",
        "options": {
          "systemMessage": "=You have access to a web_search tool that allows you to browse the internet for up-to-date information. Here's how you should operate:\n\n1. Website Information:\nFamiliarize yourself with this information about the website you're assisting. Use this as context for user interactions.\n\n2. Web Search Tool:\nYou have access to a web_search tool that can browse the internet. To use it, write the variable {web_search_question}. The tool will return relevant search results. Set the variable {model} to {{ $json.output.model }}.\n\n3. Handling User Queries:\nWhen a user asks a question, follow these steps:\na) Analyze the query to determine if it's related to the website or requires external information.\nb) If the query is about the website, use the provided website information to answer.\nc) If external information is needed, use the web_search tool to find relevant data.\n\n4. Using web_search:\n- Use web_search for factual, current information that isn't provided in the website info.\n- Formulate clear, concise search queries.\n- If the first search doesn't yield useful results, refine your query and search again.\n- Limit searches to a maximum of three per user query to maintain efficiency.\n\n5. Using Think:\n- Using Think tool to think about something. It will not obtain new information or change the database, but just append the thought to the log. Use it when complex reasoning or some cache memory is needed.\n\n6. Formulating Responses:\n- Begin with information from the website if relevant.\n- Incorporate web search results to provide up-to-date, accurate information.\n- Summarize findings concisely and coherently.\n- If you're unsure or can't find reliable information, be honest about limitations.\n\n7. Ethical Considerations:\n- Respect user privacy. Don't ask for or store personal information.\n- Provide factual information. Avoid speculation or unverified claims.\n- If asked about controversial topics, strive for a balanced, neutral response.\n- Don't engage in or encourage illegal activities.\n\n8. Output Format:\nDo not include your thought process, web searches, or any other tags in the final output.\n"
        }
      },
      "id": "61936908-d4dd-4695-9fd9-9e876f029cf3",
      "name": "AI Agent",
      "type": "@n8n/n8n-nodes-langchain.agent",
      "position": [
        384,
        -192
      ],
      "typeVersion": 1.9
    },
    {
      "parameters": {
        "promptType": "define",
        "text": "={{ $json.message.text }}",
        "hasOutputParser": true,
        "options": {
          "systemMessage": "=You are a **Routing Agent**.\n\nYour task is to analyze user queries and determine the most appropriate model to handle each specific use case.\n\n## Available Models\n\nYou have access to the following models:\n\n1. **gemma-3-12b-it-qat-GGUF**\n2. **openai_gpt-oss-120b**\n\n## Model Strengths\n\n### 1. gemma-3-12b-it-qat-GGUF\n- Standard decision-making tasks\n- Real-time workflow routing\n- Data validation and processing\n- Pattern recognition in structured data\n- Routine business logic evaluation\n\n### 2. openai_gpt-oss-120b\n- Complex multi-factor decision scenarios\n- Advanced data analysis requiring deep reasoning\n- Critical business decisions with high impact\n- Complex pattern recognition in unstructured data\n- Strategic workflow optimization\n\n## Output Format\n\nYour output must always be a valid JSON object in the following format:\n\n```json\n{\n  \"prompt\": \"user query goes here\",\n  \"model\": \"selected-model-name\"\n}\n```\n\n- The **\"prompt\"** field should contain the exact query to be sent to the selected model.\n- The **\"model\"** field should contain the model name (one of: gemma-3-12b-it-qat-GGUF, openai_gpt-oss-120b).\n\n**Important:** Only return the JSON object. Do not include any explanations or additional text."
        }
      },
      "id": "8b5269fb-3ea0-419f-94ee-502c6ed42368",
      "name": "Routing Agent",
      "type": "@n8n/n8n-nodes-langchain.agent",
      "position": [
        -432,
        128
      ],
      "typeVersion": 1.9
    },
    {
      "parameters": {
        "rules": {
          "values": [
            {
              "conditions": {
                "options": {
                  "caseSensitive": true,
                  "leftValue": "",
                  "typeValidation": "strict",
                  "version": 2
                },
                "conditions": [
                  {
                    "leftValue": "={{ $json.output.model }}",
                    "rightValue": "gemma-3-12b-it-qat-GGUF",
                    "operator": {
                      "type": "string",
                      "operation": "equals"
                    },
                    "id": "1abce04e-a3a1-49d7-bfae-dca52e3a8354"
                  }
                ],
                "combinator": "and"
              }
            },
            {
              "conditions": {
                "options": {
                  "caseSensitive": true,
                  "leftValue": "",
                  "typeValidation": "strict",
                  "version": 2
                },
                "conditions": [
                  {
                    "id": "a30e8884-f149-459b-8a7f-6265b9d4dd5e",
                    "leftValue": "={{ $json.output.model }}",
                    "rightValue": "openai_gpt-oss-120b",
                    "operator": {
                      "type": "string",
                      "operation": "equals",
                      "name": "filter.operator.equals"
                    }
                  }
                ],
                "combinator": "and"
              }
            }
          ]
        },
        "options": {}
      },
      "type": "n8n-nodes-base.switch",
      "typeVersion": 3.2,
      "position": [
        -144,
        128
      ],
      "id": "7dbffd4f-b66e-46d3-bd46-9adc921c9eef",
      "name": "Switch"
    },
    {
      "parameters": {
        "content": "## NVIDIA GeForce RTX 5090 Workstation\n\n",
        "height": 399,
        "width": 656,
        "color": 4
      },
      "id": "a5fab0f3-cfe3-4eaf-8366-8574ebf099ce",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        96,
        -256
      ],
      "typeVersion": 1
    },
    {
      "parameters": {
        "content": "## Nvidia Jetson AGX Thor Developer Kit\n\n",
        "height": 400,
        "width": 659
      },
      "id": "cfae9f47-8886-46a9-a2c7-27ef781a9839",
      "name": "Sticky Note2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        96,
        160
      ],
      "typeVersion": 1
    },
    {
      "parameters": {
        "content": "## Dynamic Model Routing for Optimal AI Responses\n\nThe Routing agent is a dynamic, AI-powered routing system that automatically selects the most appropriate large language model (LLM) to respond to a user's query based on the query's content and purpose. The Router uses lightweight and highly efficient gemma-3-4b-it-qat, optimized for low-latency inference and hosted locally on the NVIDIA RTX 5090 workstation. \n\n**Simple Queries** \nFor straightforward requests — such as factual lookups, basic definitions, or short summarizations — the Router assigns the task to the lightweight and efficient gemma-3-12b-it-qat model, optimized for speed and running locally on the RTX 5090. \n\n**Complex Queries**\nFor advanced tasks requiring multi-step reasoning, deep analysis, creative ideation, or nuanced contextual understanding — the Router escalates the request to the high-capacity openai_gpt-oss-120b model, hosted on the powerful NVIDIA Jetson AGX Thor platform for maximum performance. \n",
        "height": 819,
        "width": 775,
        "color": 3
      },
      "id": "9a01437c-f003-4ed4-b598-513f13b48a36",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -704,
        -256
      ],
      "typeVersion": 1
    },
    {
      "parameters": {
        "updates": [
          "message"
        ],
        "additionalFields": {}
      },
      "id": "124b2cfa-d6af-4569-8f08-cff9fe64cafd",
      "name": "Listen for incoming events",
      "type": "n8n-nodes-base.telegramTrigger",
      "position": [
        -656,
        128
      ],
      "webhookId": "322dce18-f93e-4f86-b9b1-3305519b7834",
      "typeVersion": 1,
      "credentials": {
        "telegramApi": {
          "id": "jsrQ9DQ2KROw5Ouy",
          "name": "Telegram account"
        }
      }
    },
    {
      "parameters": {
        "chatId": "={{ $('Listen for incoming events').first().json.message.from.id }}",
        "text": "={{ $json.output }}",
        "additionalFields": {
          "appendAttribution": false,
          "parse_mode": "HTML"
        }
      },
      "id": "16e73b18-3173-49c9-9d2a-9c6e7b27bee7",
      "name": "Telegram",
      "type": "n8n-nodes-base.telegram",
      "position": [
        800,
        64
      ],
      "typeVersion": 1.1,
      "webhookId": "df3e62fe-25fb-481f-a4ad-255a167818bb",
      "credentials": {
        "telegramApi": {
          "id": "jsrQ9DQ2KROw5Ouy",
          "name": "Telegram account"
        }
      },
      "onError": "continueErrorOutput"
    },
    {
      "parameters": {
        "chatId": "={{ $('Listen for incoming events').first().json.message.from.id }}",
        "text": "={{ $('AI Agent').item.json.output.replace(/&/g, \"&amp;\").replace(/>/g, \"&gt;\").replace(/</g, \"&lt;\").replace(/\"/g, \"&quot;\") }}",
        "additionalFields": {
          "appendAttribution": false,
          "parse_mode": "HTML"
        }
      },
      "id": "ddc949cc-c29e-49d2-9efb-24854413cd34",
      "name": "Correct errors",
      "type": "n8n-nodes-base.telegram",
      "position": [
        976,
        192
      ],
      "typeVersion": 1.1,
      "webhookId": "cd30e054-0370-4aef-b7bf-483c1e73c449",
      "credentials": {
        "telegramApi": {
          "id": "jsrQ9DQ2KROw5Ouy",
          "name": "Telegram account"
        }
      }
    },
    {
      "parameters": {
        "content": "## Telegram messenger\n\n",
        "height": 815,
        "width": 400,
        "color": 6
      },
      "id": "e3fb7971-90be-4008-8c9b-9f44c85ee4ae",
      "name": "Sticky Note3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        768,
        -256
      ],
      "typeVersion": 1
    },
    {
      "parameters": {
        "sessionIdType": "customKey",
        "sessionKey": "=chat_with_{{ $('Listen for incoming events').first().json.message.chat.id }}"
      },
      "type": "@n8n/n8n-nodes-langchain.memoryBufferWindow",
      "typeVersion": 1.3,
      "position": [
        480,
        -16
      ],
      "id": "0b2e9296-c249-4d4b-9dc1-611a7bbd950e",
      "name": "Simple Memory"
    },
    {
      "parameters": {
        "model": {
          "__rl": true,
          "mode": "list",
          "value": "gpt-4.1-mini"
        },
        "options": {}
      },
      "type": "@n8n/n8n-nodes-langchain.lmChatOpenAi",
      "typeVersion": 1.2,
      "position": [
        -528,
        304
      ],
      "id": "ff67cbbf-00dd-4e64-936b-129077ae4319",
      "name": "gemma-3-4b-it-qat",
      "credentials": {
        "openAiApi": {
          "id": "yv0BO6gWekx7HdgZ",
          "name": "Routing LLM"
        }
      }
    },
    {
      "parameters": {
        "model": {
          "__rl": true,
          "mode": "list",
          "value": "gpt-4.1-mini"
        },
        "options": {}
      },
      "type": "@n8n/n8n-nodes-langchain.lmChatOpenAi",
      "typeVersion": 1.2,
      "position": [
        320,
        -16
      ],
      "id": "fa38fb9f-02c9-4adc-b151-b6e6d9ae4d96",
      "name": "gemma-3-12b-it-qat",
      "credentials": {
        "openAiApi": {
          "id": "qdAOjjK7YVwe17tU",
          "name": "Poor LLM"
        }
      }
    },
    {
      "parameters": {
        "model": {
          "__rl": true,
          "value": "{{ $json.output.model }}",
          "mode": "id"
        },
        "options": {
          "maxTokens": 500
        }
      },
      "type": "@n8n/n8n-nodes-langchain.lmChatOpenAi",
      "typeVersion": 1.2,
      "position": [
        128,
        400
      ],
      "id": "bd1ee1ca-053f-4421-a8cd-135c827ab4f3",
      "name": "openai_gpt-oss-120b",
      "credentials": {
        "openAiApi": {
          "id": "erOxMyvS0PxB4Hm4",
          "name": "LLM"
        }
      }
    }
  ],
  "connections": {
    "Structured Output Parser": {
      "ai_outputParser": [
        [
          {
            "node": "Routing Agent",
            "type": "ai_outputParser",
            "index": 0
          }
        ]
      ]
    },
    "Think": {
      "ai_tool": [
        [
          {
            "node": "AI Agent1",
            "type": "ai_tool",
            "index": 0
          }
        ]
      ]
    },
    "Calculator": {
      "ai_tool": [
        [
          {
            "node": "AI Agent1",
            "type": "ai_tool",
            "index": 0
          }
        ]
      ]
    },
    "Simple Memory1": {
      "ai_memory": [
        [
          {
            "node": "AI Agent1",
            "type": "ai_memory",
            "index": 0
          }
        ]
      ]
    },
    "AI Agent1": {
      "main": [
        [
          {
            "node": "Telegram",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Wikipedia1": {
      "ai_tool": [
        [
          {
            "node": "AI Agent1",
            "type": "ai_tool",
            "index": 0
          }
        ]
      ]
    },
    "AI Agent": {
      "main": [
        [
          {
            "node": "Telegram",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Routing Agent": {
      "main": [
        [
          {
            "node": "Switch",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Switch": {
      "main": [
        [
          {
            "node": "AI Agent",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "AI Agent1",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Listen for incoming events": {
      "main": [
        [
          {
            "node": "Routing Agent",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Telegram": {
      "main": [
        [],
        [
          {
            "node": "Correct errors",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Simple Memory": {
      "ai_memory": [
        [
          {
            "node": "AI Agent",
            "type": "ai_memory",
            "index": 0
          }
        ]
      ]
    },
    "gemma-3-4b-it-qat": {
      "ai_languageModel": [
        [
          {
            "node": "Routing Agent",
            "type": "ai_languageModel",
            "index": 0
          }
        ]
      ]
    },
    "gemma-3-12b-it-qat": {
      "ai_languageModel": [
        [
          {
            "node": "AI Agent",
            "type": "ai_languageModel",
            "index": 0
          }
        ]
      ]
    },
    "openai_gpt-oss-120b": {
      "ai_languageModel": [
        [
          {
            "node": "AI Agent1",
            "type": "ai_languageModel",
            "index": 0
          }
        ]
      ]
    }
  },
  "pinData": {},
  "meta": {
    "instanceId": "02c687c2c688456cd1989d281e369e5238e96172a8448f02fdd7e663783000c4"
  }
}

You’ll need to configure credentials:

Telegram Bot API (for trigger + response)
OpenAI-compatible endpoints for each LLM using llama.cpp.

This workflow is fully modular.

User interacting with the AI agent via Telegram.

You now have a fully functional, multi-node, dynamically routed AI agent system built using n8n and Telegram Messenger. Whether you’re building a smart assistant, an industrial AI agent, or an autonomous robotics controller, this routing architecture ensures your system remains responsive, efficient, and intelligent.

Smart routing can reduce LLM costs and improve the performance - without sacrificing quality.

I hope you found this guide useful, and thanks for reading. If you have any questions or feedback? Leave a comment below. If you like this post, please support me by subscribing to my hackster blog.

Stay tuned. 🚀

📚 References & Further Reading

Nurgaliyev Shakhizat

80 projects • 207 followers

I am a hardcore robotics and IoT enthusiast. Email: shahizat005@gmail.com

Building a local Router AI-Agent with n8n and llama.cpp

Things used in this project

Hardware components

Story

Required Hardware

Hardware architecture

The Router AI Agent: Intelligent Query Orchestration

Part 1: Building llama.cpp on the Workstation (RTX 5090) and Jetson AGX Thor

Part 2: Setting Up Secure Networking with Tailscale

Part 3: Deploying n8n for Workflow Automation

Part 4: Simple Demo- Chat Trigger + Routing Agent

Part 5: Real-World Demo: Telegram AI Assistant

📚 References & Further Reading

Credits

Nurgaliyev Shakhizat

Comments

Embed the widget on your own site

Building a local Router AI-Agent with n8n and llama.cpp

Building a local Router AI-Agent with n8n and llama.cpp

Things used in this project

Hardware components

Story

Required Hardware

Hardware architecture

The Router AI Agent: Intelligent Query Orchestration

Part 1: Building llama.cpp on the Workstation (RTX 5090) and Jetson AGX Thor

Part 2: Setting Up Secure Networking with Tailscale

Part 3: Deploying n8n for Workflow Automation

Part 4: Simple Demo- Chat Trigger + Routing Agent

Part 5: Real-World Demo: Telegram AI Assistant

📚 References & Further Reading

Credits

Nurgaliyev Shakhizat

Comments

Related channels and tags