Published November 29, 2025 © MIT

HuskyLens 2 Model Context Protocol (MCP)

How to use Huskylens2 AI vision sensor MCP with LLMs

IntermediateFull instructions provided1 hour971

HuskyLens 2 Model Context Protocol (MCP)

Things used in this project

Hardware components

DFRobot Huskylens 2

Software apps and online services

Google Gemini

Story

Huskylens 2 with MCP server

When I first encountered the HuskyLens 2, several features immediately stood out as major improvements over the original V1: native Wi-Fi connectivity, expanded memory, and a comprehensive suite of pre-installed AI models. However, the feature that caught my attention was the MCP Server.

MCP stands for Model Context Protocol. This feature allows the camera to expose its internal AI functions as a set of callable tools, making them available for integration with external systems like Large Language Models (LLMs) and custom applications.

This capability creates a powerful synergy: you can now combine the local, specialized AI functions of the HuskyLens (like object detection and face recognition) with the reasoning, natural language understanding, and emerging features of powerful external LLMs.

While DFRobot provided an initial example using a third-party desktop application called Cherrystudio, I was keen to determine if direct, pure Python integration was possible using a modern, accessible LLM like Gemini Flash.

MCP enabled

Setup Guide

1. Firmware Upgrade

My HuskyLens 2 was an early maker release, which required a firmware upgrade to enable the latest features. The process was perfectly detailed and documented on the DFRobot website. I must pause here to offer my genuine thanks: the time and effort DFRobot invests in thorough documentation is invaluable. As makers working on multiple projects, nothing is more frustrating than losing time on setup and configuration simply because hardware companies neglect essential documentation.

You will need to download the following files:

Firmware image: huskylensV2-v1.1.6.1031.img.7z
Burning tool: K230BurningTool.zip
Driver installation tool: Zadig - Driver Installation Tool

All necessary steps and details are available here.

Once firmware version 1.1.6 is successfully installed, navigate to the settings, connect the HuskyLens to your local Wi-Fi router, and ensure the MCP Server is enabled.

2. LLM and API Configuration

Next, you'll need a Google Gemini API key. You can obtain one easily at the Google AI Studio website: https://aistudio.google.com/app/api-keys. Google offers a generous free tier, and the paid options remain highly affordable for more intensive usage.

Gemini API Key

3. Client Setup

Finally, connect the camera to your Wi-Fi network and note the IP address assigned to the MCP Server. Open the Python client script (HuskyMCPChat.py) with a text editor and configure your Gemini API Key and the MCP Server IP Raddress within the script variables.

Source code

Run with $ python HuskyMCPChat.py

Usage and Interactivity

With the Python client running, you can use a menu and natural language commands to interact with the camera via the LLM

Change Algorithms: Switch algorithms (e.g., "switch to face recognition").
Take Photos: Capture images, which are stored on the internal memory ("take a picture").
Visual Query:Ask the LLM what the camera currently sees based on the active algorithm ("what do you see?").
Combined Reasoning: Combine the camera's recognition data with an LLM prompt for queries such as: "Is there anything dangerous on the table?"

Object recognition with a default model

LLM accesing to the cammera

HuskyLens MCP Tools Overview

The following tools are exposed by the HuskyLens MCP Server and are callable by the LLM:

get_recognition_result

Obtains the real-time recognition result from HuskyLens, including image data and recognized labels (e.g., object type, person name). The primary operation is get_result. This is crucial for visual reasoning and generating natural-language descriptions of the camera's view.

manage_applications

Used to manage and query all internal applications (algorithms) of the HuskyLens. Supports operations like current_application, switch_application, and application_list.

multimedia_control

Provides control over the HuskyLens multimedia components, primarily the camera. The main operation is take_photo.

task_scheduler

Manages scheduled tasks. Call this tool when you need to create a timed or triggered action, such as: 'Take a picture when you see the keyboard' or 'Take a picture after 3 seconds'. Supports create and list operations. Tasks are defined by a trigger(optional, e.g., 'tiger'), a handler (required, currently only supports take_photo), and an optional timestamp for scheduled time.

Face recognition

Command lince interface

Future MCP Server Enhancements

The MCP server of Huskylens 2 is a powerful foundation, but future upgrades could make it even better. Good features to include could be:

The ability to retrieve the pictures taken.
Expanded support for data retrieval from specialized algorithms like OCR (Optical Character Recognition), QR, Bar code, etc

Final Notes

The Huskylens 2 represents a significant leap forward compared to its predecessor. It comes fully loaded with useful, pre-trained models like OCR, license plate recognition, and various object and face recognition capabilities, plus the ability to train and install custom models. The introduction of the MCP Server opens up entirely new possibilities for interconnection, transforming the Huskylens from a standalone device into a powerful vision node within a broader LLM-driven AI ecosystem

ProjectLinks

Husky Lens 2

Wiki page

Github

View also

Ai retro console with n8n and llm

n8n AI matrix display