In this project, weβll design and implement an AI agent architecture that leverages multiple large language models (LLMs) distributed across different hardware, intelligently routing user queries to the most appropriate model based on complexity, latency, and computational requirements. At the heart of this system is a router agent, which acts as a decision-making layer powered by its own LLM inference to interpret user intent and delegate tasks efficiently.
Weβll be using the powerful NVIDIA Jetson AGX Thor Developer Kit
as our primary edge AI platform, complemented by a machine with NVIDIA GeForce RTX 5090 GPU
for auxiliary inference workloads. This setup enables us to balance performance, and responsiveness by dynamically assigning queries to either a lightweight or heavyweight LLM based on prompt understanding. This system is orchestrated by n8n
, secured via Tailscale
, and powered by llama.cpp
for the inference server engine.
The system is built around the NVIDIA Jetson AGX Thor Developer Kit - a high-performance single-board computer optimized for edge AI applications. With 128GB of unified VRAM, itβs uniquely capable of running massive, state-of-the-art open-source models like OpenIA's GPT-OSS-120B and GPT-OSS-20B
, which are simply too large to run on a typical standalone PC with limited VRAM.
Here is a photo of the NVIDIA Jetson AGX Thor developer kit:
The system leverages a powerful combination of edge and desktop computing hardware. A NVIDIA GeForce RTX 5090 GPU is also used for running a more efficient, quantized model, ensuring a fast response time.
This dual-hardware approach ensures the system is both powerful and responsive, addressing the limitations of relying on a single piece of hardware.
Without further ado, letβs get started!
Hardware architectureHere's a brief overview of the scenario I would like to create:
The n8n workflow automation tool, which is the brain of the operation, managing all logic and communication.
The system consists of two main physical nodes connected via a Local Area Network (LAN) and securely exposed to the internet through a Tailscale funnel. A Funnel securely exposes the n8n web service (running on the 5090 RTX workstation) to the public Internet. It provides a secure HTTPS URL that allows services like Telegram or a web frontend to send user queries into our private tailnet without complex port forwarding or security risks. The models are served using llama.cpp, optimized for the ARM-based architecture of the Jetson platform.
The Router AI Agent: Intelligent Query OrchestrationThe intelligent decision-making at the heart of this system is performed by the Router Agent. We deploy a lightweight, quantization-aware trained (QAT) version of Googleβs Gemma model - gemma-3-4b-it-qat
. Despite its smaller size, it is instruction-tuned and highly capable of intent classification and routing logic.
Simple Queries: If the query is determined to be straightforward (e.g., factual questions, simple definitions, brief summarizations), the Router directs it to the efficient gemma-3-12b-it-qat
model running on the RTX 5090.
Complex Queries: If the query is identified as complex (e.g., requiring multi-step reasoning, in-depth analysis, creative generation, or nuanced discussion), the Router directs it to the powerful openai_gpt-oss-120b
model running on the Jetson AGX Thor.
We'll use llama.cpp to serve our models efficiently. Follow these steps to build it with CUDA support. Clone the llama.cpp repository from GitHub. This repository contains all the necessary source code and build scripts:
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
Once inside the llama.cpp directory, you'll need to configure the build using CMake. The following cmake command sets up the build environment:
cmake -B build -DGGML_CUDA=ON
After configuring the build with CMake, you can compile the project using the following command:
cmake --build build --config Release -j $(nproc)
The server binary will be located at ./build/bin/llama-server
.
This binary will serve as the backbone for all our LLMs.
We need to expose our n8n automation server securely, without opening firewall holes or managing TLS certificates.
First, you'll need to install the Tailscale
client on your machine. Run the following command in your terminal:
curl -fsSL https://tailscale.com/install.sh | sh
After the installation is complete, run the following command to connect your machine to your Tailscale network:
sudo tailscale up
You should see a message confirming the login was successful. At this point, your machine is now part of your private Tailscale network, or tailnet
.
Then, enable funnel
sudo tailscale funnel 5678
Youβll get a public URL like:
https://your-machine-name.tailnet-hash.ts.net
This is how Telegram (or any external service) will securely reach your local n8n instance - no port forwarding, no dynamic DNS, no stress.
Part 3: Deploying n8n for Workflow Automationn8n
is a powerful, open-source workflow automation tool that will act as the brain of your AI agent, connecting Telegram, LLMs, and vector databases.
Create a project directory to store your n8n environment configuration and Docker Compose files and navigate inside:
mkdir n8n-compose
cd n8n-compose
Create local files directory
mkdir local-files
Find your machineβs LAN IP (pick one interface/IP you use at home):
hostname -I
Create compose.yaml:
services:
n8n:
image: docker.n8n.io/n8nio/n8n:latest
restart: always
ports:
- "5678:5678" # reachable at http://<LAN_IP>:5678
environment:
- N8N_HOST=1P_ADDRESSS # put your LAN IP or hostname here
- N8N_PORT=5678
- N8N_PROTOCOL=http
- WEBHOOK_URL=https://your-machine-name.tailnet-hash.ts.net # tailscale funnel
- N8N_RUNNERS_ENABLED=true
- N8N_SECURE_COOKIE=false
- GENERIC_TIMEZONE=Asia/Almaty
- TZ=Asia/Almaty
- N8N_ENFORCE_SETTINGS_FILE_PERMISSIONS=true
volumes:
- n8n_data:/home/node/.n8n
- ./local-files:/files
volumes:
n8n_data:
Start n8n by typing:
sudo docker compose up -d
Open your browser and go to:
http://<worksation-ip>:5678
To create a Funnel, use the tailscale funnel
command and pass the target port you want to share.
sudo tailscale funnel 5678
Youβll see output like:
Available on the internet:
https://your-machine-name.tailnet-hash.ts.net
|-- proxy http://127.0.0.1:5678
Press Ctrl+C to exit.
On the Workstation (RTX 5090):
Then, start the Router Agent LLM (Gemma 3 4B QAT) using llama.cpp server:
./build/bin/llama-server -hf ggml-org/gemma-3-4b-it-qat-GGUF -c 0 -fa on --jinja --host 0.0.0.0 --port 9000
Start the Lightweight LLM (Gemma 3 12B QAT):
./build/bin/llama-server -hf ggml-org/gemma-3-12b-it-qat-GGUF -c 0 -fa on --jinja --host 0.0.0.0 --port 10000
On the Jetson AGX Thor:
Start the Heavyweight LLM (GPT-OSS 120B):
./build/bin/llama-server -hf bartowski/openai_gpt-oss-120b-GGUF --jinja --host 0.0.0.0 --port 9000
Pro Tip: Use different ports to avoid conflicts.Part 4: Simple Demo- Chat Trigger + Routing Agent
Here is a simple example of an n8n workflow where a user can interact with LLMs via an n8n chat trigger. A routing agent is a small and fast LLM (Gemma 3 4B QAT), analyzes incoming user queries and makes a decision.
This agentβs only job is to analyze the query and output structured JSON like:
{
"prompt": "Whatβs the capital of France?",
"model": "gemma-3-12b-it-qat-GGUF"
}
Or, for a more complex question:
{
"prompt": "Compare transformer architectures in LLMs and explain trade-offs for edge deployment.",
"model": "openai_gpt-oss-120b"
}
Below is a screenshot of the sample output.
Let's look at a more complex demo app.
Part 5: Real-World Demo: Telegram AI AssistantNow letβs scale this into a real-world application. Users can interact naturally via Telegram messenger:
The magic happens in n8nβs workflow. Hereβs how it works:
- Telegram Trigger: Listens for user messages.
- Routing Agent (Gemma 4B QAT): Receives the raw query. Its only job is to output structured JSON:
{
"prompt": "user's original prompt",
"model": "gemma-3-12b-it-qat-GGUF" // or "openai_gpt-oss-120b"
}
- Switch Node: Routes execution based on the model field.
- AI Agent Nodes: Each connected to its respective LLM endpoint (via OpenAI-compatible API).
- Memory & Tools: Each agent has its own memory buffer and optional tools (calculator, Wikipedia, web search).
- Telegram Response: Sends the final answer back to the user.
Import the workflow JSON (provided below) into n8n. This workflow:
{
"nodes": [
{
"parameters": {
"schemaType": "manual",
"inputSchema": "{\n\t\"type\": \"object\",\n\t\"properties\": {\n\t\t\"prompt\": {\n\t\t\t\"type\": \"string\"\n\t\t},\n\t\t\"model\": {\n\t\t\t\"type\": \"string\"\n\t\t}\n\t}\n}"
},
"id": "cc1002dd-c4de-4bac-96d5-97e849ae14d3",
"name": "Structured Output Parser",
"type": "@n8n/n8n-nodes-langchain.outputParserStructured",
"position": [
-272,
304
],
"typeVersion": 1.2
},
{
"parameters": {},
"id": "b23223ba-37a7-4efd-b99b-137c47cd624b",
"name": "Think",
"type": "@n8n/n8n-nodes-langchain.toolThink",
"position": [
368,
400
],
"typeVersion": 1
},
{
"parameters": {},
"id": "84061271-0779-4289-9c14-d365f0a8ea9a",
"name": "Calculator",
"type": "@n8n/n8n-nodes-langchain.toolCalculator",
"position": [
496,
400
],
"typeVersion": 1
},
{
"parameters": {
"sessionIdType": "customKey",
"sessionKey": "=chat_with_{{ $('Listen for incoming events').first().json.message.chat.id }}"
},
"id": "3bcc650f-0ba0-4ef9-8817-559c1a529846",
"name": "Simple Memory1",
"type": "@n8n/n8n-nodes-langchain.memoryBufferWindow",
"position": [
256,
400
],
"typeVersion": 1.3
},
{
"parameters": {
"promptType": "define",
"text": "={{ $json.output.prompt }}",
"options": {
"systemMessage": "=You have access to a web_search tool that allows you to browse the internet for up-to-date information. Here's how you should operate:\n\n1. Website Information:\nFamiliarize yourself with this information about the website you're assisting. Use this as context for user interactions.\n\n2. Web Search Tool:\nYou have access to a web_search tool that can browse the internet. To use it, write the variable {web_search_question}. The tool will return relevant search results. Set the variable {model} to {{ $json.output.model }}.\n\n3. Handling User Queries:\nWhen a user asks a question, follow these steps:\na) Analyze the query to determine if it's related to the website or requires external information.\nb) If the query is about the website, use the provided website information to answer.\nc) If external information is needed, use the web_search tool to find relevant data.\n\n4. Using web_search:\n- Use web_search for factual, current information that isn't provided in the website info.\n- Formulate clear, concise search queries.\n- If the first search doesn't yield useful results, refine your query and search again.\n- Limit searches to a maximum of three per user query to maintain efficiency.\n\n5. Using Think:\n- Using Think tool to think about something. It will not obtain new information or change the database, but just append the thought to the log. Use it when complex reasoning or some cache memory is needed.\n\n6. Formulating Responses:\n- Begin with information from the website if relevant.\n- Incorporate web search results to provide up-to-date, accurate information.\n- Summarize findings concisely and coherently.\n- If you're unsure or can't find reliable information, be honest about limitations.\n\n7. Ethical Considerations:\n- Respect user privacy. Don't ask for or store personal information.\n- Provide factual information. Avoid speculation or unverified claims.\n- If asked about controversial topics, strive for a balanced, neutral response.\n- Don't engage in or encourage illegal activities.\n\n8. Output Format:\nDo not include your thought process, web searches, or any other tags in the final output.\n"
}
},
"id": "73f26cd6-8e3a-4987-b3f0-71b12e29a2a0",
"name": "AI Agent1",
"type": "@n8n/n8n-nodes-langchain.agent",
"position": [
384,
224
],
"typeVersion": 1.9
},
{
"parameters": {},
"type": "@n8n/n8n-nodes-langchain.toolWikipedia",
"typeVersion": 1,
"position": [
624,
400
],
"id": "f7d637b3-2542-40d2-b003-351e52ecf385",
"name": "Wikipedia1"
},
{
"parameters": {
"promptType": "define",
"text": "={{ $json.output.prompt }}",
"options": {
"systemMessage": "=You have access to a web_search tool that allows you to browse the internet for up-to-date information. Here's how you should operate:\n\n1. Website Information:\nFamiliarize yourself with this information about the website you're assisting. Use this as context for user interactions.\n\n2. Web Search Tool:\nYou have access to a web_search tool that can browse the internet. To use it, write the variable {web_search_question}. The tool will return relevant search results. Set the variable {model} to {{ $json.output.model }}.\n\n3. Handling User Queries:\nWhen a user asks a question, follow these steps:\na) Analyze the query to determine if it's related to the website or requires external information.\nb) If the query is about the website, use the provided website information to answer.\nc) If external information is needed, use the web_search tool to find relevant data.\n\n4. Using web_search:\n- Use web_search for factual, current information that isn't provided in the website info.\n- Formulate clear, concise search queries.\n- If the first search doesn't yield useful results, refine your query and search again.\n- Limit searches to a maximum of three per user query to maintain efficiency.\n\n5. Using Think:\n- Using Think tool to think about something. It will not obtain new information or change the database, but just append the thought to the log. Use it when complex reasoning or some cache memory is needed.\n\n6. Formulating Responses:\n- Begin with information from the website if relevant.\n- Incorporate web search results to provide up-to-date, accurate information.\n- Summarize findings concisely and coherently.\n- If you're unsure or can't find reliable information, be honest about limitations.\n\n7. Ethical Considerations:\n- Respect user privacy. Don't ask for or store personal information.\n- Provide factual information. Avoid speculation or unverified claims.\n- If asked about controversial topics, strive for a balanced, neutral response.\n- Don't engage in or encourage illegal activities.\n\n8. Output Format:\nDo not include your thought process, web searches, or any other tags in the final output.\n"
}
},
"id": "61936908-d4dd-4695-9fd9-9e876f029cf3",
"name": "AI Agent",
"type": "@n8n/n8n-nodes-langchain.agent",
"position": [
384,
-192
],
"typeVersion": 1.9
},
{
"parameters": {
"promptType": "define",
"text": "={{ $json.message.text }}",
"hasOutputParser": true,
"options": {
"systemMessage": "=You are a **Routing Agent**.\n\nYour task is to analyze user queries and determine the most appropriate model to handle each specific use case.\n\n## Available Models\n\nYou have access to the following models:\n\n1. **gemma-3-12b-it-qat-GGUF**\n2. **openai_gpt-oss-120b**\n\n## Model Strengths\n\n### 1. gemma-3-12b-it-qat-GGUF\n- Standard decision-making tasks\n- Real-time workflow routing\n- Data validation and processing\n- Pattern recognition in structured data\n- Routine business logic evaluation\n\n### 2. openai_gpt-oss-120b\n- Complex multi-factor decision scenarios\n- Advanced data analysis requiring deep reasoning\n- Critical business decisions with high impact\n- Complex pattern recognition in unstructured data\n- Strategic workflow optimization\n\n## Output Format\n\nYour output must always be a valid JSON object in the following format:\n\n```json\n{\n \"prompt\": \"user query goes here\",\n \"model\": \"selected-model-name\"\n}\n```\n\n- The **\"prompt\"** field should contain the exact query to be sent to the selected model.\n- The **\"model\"** field should contain the model name (one of: gemma-3-12b-it-qat-GGUF, openai_gpt-oss-120b).\n\n**Important:** Only return the JSON object. Do not include any explanations or additional text."
}
},
"id": "8b5269fb-3ea0-419f-94ee-502c6ed42368",
"name": "Routing Agent",
"type": "@n8n/n8n-nodes-langchain.agent",
"position": [
-432,
128
],
"typeVersion": 1.9
},
{
"parameters": {
"rules": {
"values": [
{
"conditions": {
"options": {
"caseSensitive": true,
"leftValue": "",
"typeValidation": "strict",
"version": 2
},
"conditions": [
{
"leftValue": "={{ $json.output.model }}",
"rightValue": "gemma-3-12b-it-qat-GGUF",
"operator": {
"type": "string",
"operation": "equals"
},
"id": "1abce04e-a3a1-49d7-bfae-dca52e3a8354"
}
],
"combinator": "and"
}
},
{
"conditions": {
"options": {
"caseSensitive": true,
"leftValue": "",
"typeValidation": "strict",
"version": 2
},
"conditions": [
{
"id": "a30e8884-f149-459b-8a7f-6265b9d4dd5e",
"leftValue": "={{ $json.output.model }}",
"rightValue": "openai_gpt-oss-120b",
"operator": {
"type": "string",
"operation": "equals",
"name": "filter.operator.equals"
}
}
],
"combinator": "and"
}
}
]
},
"options": {}
},
"type": "n8n-nodes-base.switch",
"typeVersion": 3.2,
"position": [
-144,
128
],
"id": "7dbffd4f-b66e-46d3-bd46-9adc921c9eef",
"name": "Switch"
},
{
"parameters": {
"content": "## NVIDIA GeForce RTX 5090 Workstation\n\n",
"height": 399,
"width": 656,
"color": 4
},
"id": "a5fab0f3-cfe3-4eaf-8366-8574ebf099ce",
"name": "Sticky Note",
"type": "n8n-nodes-base.stickyNote",
"position": [
96,
-256
],
"typeVersion": 1
},
{
"parameters": {
"content": "## Nvidia Jetson AGX Thor Developer Kit\n\n",
"height": 400,
"width": 659
},
"id": "cfae9f47-8886-46a9-a2c7-27ef781a9839",
"name": "Sticky Note2",
"type": "n8n-nodes-base.stickyNote",
"position": [
96,
160
],
"typeVersion": 1
},
{
"parameters": {
"content": "## Dynamic Model Routing for Optimal AI Responses\n\nThe Routing agent is a dynamic, AI-powered routing system that automatically selects the most appropriate large language model (LLM) to respond to a user's query based on the query's content and purpose. The Router uses lightweight and highly efficient gemma-3-4b-it-qat, optimized for low-latency inference and hosted locally on the NVIDIA RTX 5090 workstation. \n\n**Simple Queries** \nFor straightforward requests β such as factual lookups, basic definitions, or short summarizations β the Router assigns the task to the lightweight and efficient gemma-3-12b-it-qat model, optimized for speed and running locally on the RTX 5090. \n\n**Complex Queries**\nFor advanced tasks requiring multi-step reasoning, deep analysis, creative ideation, or nuanced contextual understanding β the Router escalates the request to the high-capacity openai_gpt-oss-120b model, hosted on the powerful NVIDIA Jetson AGX Thor platform for maximum performance. \n",
"height": 819,
"width": 775,
"color": 3
},
"id": "9a01437c-f003-4ed4-b598-513f13b48a36",
"name": "Sticky Note1",
"type": "n8n-nodes-base.stickyNote",
"position": [
-704,
-256
],
"typeVersion": 1
},
{
"parameters": {
"updates": [
"message"
],
"additionalFields": {}
},
"id": "124b2cfa-d6af-4569-8f08-cff9fe64cafd",
"name": "Listen for incoming events",
"type": "n8n-nodes-base.telegramTrigger",
"position": [
-656,
128
],
"webhookId": "322dce18-f93e-4f86-b9b1-3305519b7834",
"typeVersion": 1,
"credentials": {
"telegramApi": {
"id": "jsrQ9DQ2KROw5Ouy",
"name": "Telegram account"
}
}
},
{
"parameters": {
"chatId": "={{ $('Listen for incoming events').first().json.message.from.id }}",
"text": "={{ $json.output }}",
"additionalFields": {
"appendAttribution": false,
"parse_mode": "HTML"
}
},
"id": "16e73b18-3173-49c9-9d2a-9c6e7b27bee7",
"name": "Telegram",
"type": "n8n-nodes-base.telegram",
"position": [
800,
64
],
"typeVersion": 1.1,
"webhookId": "df3e62fe-25fb-481f-a4ad-255a167818bb",
"credentials": {
"telegramApi": {
"id": "jsrQ9DQ2KROw5Ouy",
"name": "Telegram account"
}
},
"onError": "continueErrorOutput"
},
{
"parameters": {
"chatId": "={{ $('Listen for incoming events').first().json.message.from.id }}",
"text": "={{ $('AI Agent').item.json.output.replace(/&/g, \"&\").replace(/>/g, \">\").replace(/</g, \"<\").replace(/\"/g, \""\") }}",
"additionalFields": {
"appendAttribution": false,
"parse_mode": "HTML"
}
},
"id": "ddc949cc-c29e-49d2-9efb-24854413cd34",
"name": "Correct errors",
"type": "n8n-nodes-base.telegram",
"position": [
976,
192
],
"typeVersion": 1.1,
"webhookId": "cd30e054-0370-4aef-b7bf-483c1e73c449",
"credentials": {
"telegramApi": {
"id": "jsrQ9DQ2KROw5Ouy",
"name": "Telegram account"
}
}
},
{
"parameters": {
"content": "## Telegram messenger\n\n",
"height": 815,
"width": 400,
"color": 6
},
"id": "e3fb7971-90be-4008-8c9b-9f44c85ee4ae",
"name": "Sticky Note3",
"type": "n8n-nodes-base.stickyNote",
"position": [
768,
-256
],
"typeVersion": 1
},
{
"parameters": {
"sessionIdType": "customKey",
"sessionKey": "=chat_with_{{ $('Listen for incoming events').first().json.message.chat.id }}"
},
"type": "@n8n/n8n-nodes-langchain.memoryBufferWindow",
"typeVersion": 1.3,
"position": [
480,
-16
],
"id": "0b2e9296-c249-4d4b-9dc1-611a7bbd950e",
"name": "Simple Memory"
},
{
"parameters": {
"model": {
"__rl": true,
"mode": "list",
"value": "gpt-4.1-mini"
},
"options": {}
},
"type": "@n8n/n8n-nodes-langchain.lmChatOpenAi",
"typeVersion": 1.2,
"position": [
-528,
304
],
"id": "ff67cbbf-00dd-4e64-936b-129077ae4319",
"name": "gemma-3-4b-it-qat",
"credentials": {
"openAiApi": {
"id": "yv0BO6gWekx7HdgZ",
"name": "Routing LLM"
}
}
},
{
"parameters": {
"model": {
"__rl": true,
"mode": "list",
"value": "gpt-4.1-mini"
},
"options": {}
},
"type": "@n8n/n8n-nodes-langchain.lmChatOpenAi",
"typeVersion": 1.2,
"position": [
320,
-16
],
"id": "fa38fb9f-02c9-4adc-b151-b6e6d9ae4d96",
"name": "gemma-3-12b-it-qat",
"credentials": {
"openAiApi": {
"id": "qdAOjjK7YVwe17tU",
"name": "Poor LLM"
}
}
},
{
"parameters": {
"model": {
"__rl": true,
"value": "{{ $json.output.model }}",
"mode": "id"
},
"options": {
"maxTokens": 500
}
},
"type": "@n8n/n8n-nodes-langchain.lmChatOpenAi",
"typeVersion": 1.2,
"position": [
128,
400
],
"id": "bd1ee1ca-053f-4421-a8cd-135c827ab4f3",
"name": "openai_gpt-oss-120b",
"credentials": {
"openAiApi": {
"id": "erOxMyvS0PxB4Hm4",
"name": "LLM"
}
}
}
],
"connections": {
"Structured Output Parser": {
"ai_outputParser": [
[
{
"node": "Routing Agent",
"type": "ai_outputParser",
"index": 0
}
]
]
},
"Think": {
"ai_tool": [
[
{
"node": "AI Agent1",
"type": "ai_tool",
"index": 0
}
]
]
},
"Calculator": {
"ai_tool": [
[
{
"node": "AI Agent1",
"type": "ai_tool",
"index": 0
}
]
]
},
"Simple Memory1": {
"ai_memory": [
[
{
"node": "AI Agent1",
"type": "ai_memory",
"index": 0
}
]
]
},
"AI Agent1": {
"main": [
[
{
"node": "Telegram",
"type": "main",
"index": 0
}
]
]
},
"Wikipedia1": {
"ai_tool": [
[
{
"node": "AI Agent1",
"type": "ai_tool",
"index": 0
}
]
]
},
"AI Agent": {
"main": [
[
{
"node": "Telegram",
"type": "main",
"index": 0
}
]
]
},
"Routing Agent": {
"main": [
[
{
"node": "Switch",
"type": "main",
"index": 0
}
]
]
},
"Switch": {
"main": [
[
{
"node": "AI Agent",
"type": "main",
"index": 0
}
],
[
{
"node": "AI Agent1",
"type": "main",
"index": 0
}
]
]
},
"Listen for incoming events": {
"main": [
[
{
"node": "Routing Agent",
"type": "main",
"index": 0
}
]
]
},
"Telegram": {
"main": [
[],
[
{
"node": "Correct errors",
"type": "main",
"index": 0
}
]
]
},
"Simple Memory": {
"ai_memory": [
[
{
"node": "AI Agent",
"type": "ai_memory",
"index": 0
}
]
]
},
"gemma-3-4b-it-qat": {
"ai_languageModel": [
[
{
"node": "Routing Agent",
"type": "ai_languageModel",
"index": 0
}
]
]
},
"gemma-3-12b-it-qat": {
"ai_languageModel": [
[
{
"node": "AI Agent",
"type": "ai_languageModel",
"index": 0
}
]
]
},
"openai_gpt-oss-120b": {
"ai_languageModel": [
[
{
"node": "AI Agent1",
"type": "ai_languageModel",
"index": 0
}
]
]
}
},
"pinData": {},
"meta": {
"instanceId": "02c687c2c688456cd1989d281e369e5238e96172a8448f02fdd7e663783000c4"
}
}
Youβll need to configure credentials:
- Telegram Bot API (for trigger + response)
- OpenAI-compatible endpoints for each LLM using llama.cpp.
This workflow is fully modular.
You now have a fully functional, multi-node, dynamically routed AI agent system built using n8n and Telegram Messenger. Whether youβre building a smart assistant, an industrial AI agent, or an autonomous robotics controller, this routing architecture ensures your system remains responsive, efficient, and intelligent.
Smart routing can reduce LLM costs and improve the performance - without sacrificing quality.
I hope you found this guide useful, and thanks for reading. If you have any questions or feedback? Leave a comment below. If you like this post, please support me by subscribing to my hackster blog.
Stay tuned. π
π References & Further Reading
Comments