Published October 21, 2025 © CC BY-ND

Create Your Own AI Voice Agent Using ESP32 and Rust

This article will how to run a Voice AI Agent on ESP32. All the code here is open sourced

BeginnerFull instructions provided1.5 hours449

Create Your Own AI Voice Agent Using ESP32 and Rust

Things used in this project

Hardware components

USB-A to B Cable

EchoKit

Software apps and online services

EchoKit Firmware

EchoKit server

Hand tools and fabrication machines

Laptop

Story

Voice AI assistants are everywhere today, but most are based on proprietary models and cloud services and offer little opportunity to explore how they actually work. With EchoKit, you can build your own local voice AI assistant on an ESP32 board — fully open-source, educational, and customizable.

This project is designed for makers, students, educators, and AI enthusiasts who want hands-on experience with modern AI technologies. EchoKit integrates end to end model, speech-to-text, large language models, and text-to-speech into a compact device, giving you the chance to experiment safely and learn how these systems communicate in real time.

By building and customizing your own EchoKit, you’ll not only get an interactive voice assistant, but also a deeper understanding of AI pipelines, firmware development, and device integration. It’s perfect for classroom demonstrations, makerspace projects, or personal experiments in AI.

System Overview

The diagrams below illustrate the overall architecture — from speech input to AI response.

Overview

Workflow

For the voice interaction, we support the ASR-LLM-TTS pipeline (classic modular approach) and the end-to-end pipeline. In my view, the ASR-LLM-TTS pipeline allows you to customize more, like adding MCP server and knowledge base. Click here to learn more differences between the ASR-LLM-TTS pipeline and the end-to-end pipeline.

Next, let's build a voice AI agent together.

Step 1: Assemble the hardware

The EchoKit hardware is made up of several components

ESP32-S3 development board
Extension board with audio and microphone modules
Mini speaker
1.54” LCD screen

Let's assemble the hardware together.

Connect the mini speaker to the audio module in the middle of the extension board.

Mount the ESP32-S3 board onto the extension board.

Insert the LCD screen into the top slot of the extension board.

Once assembled, you should have a fully functional EchoKit hardware setup.

Step 2: Flash the firmware

Now that the hardware is assembled, it’s time to flash the firmware onto the device. This will enable the EchoKit to communicate with the server and AI models.

1. Connect the EchoKit device to your computer using the USB-C cable.

2. Use espflash command-line tool to flash the firmware.

Since the EchoKit firmware is written in Rust, you will need to install Rust toolchains and the espflash and its dependencies.

cargo install cargo-espflash espflash ldproxy

Get the latest firmware.

curl -L -o echokit https://echokit.dev/firmware/echokit

Flash the firmware to the EchoKit device.

espflash flash --monitor --flash-size 16mb echokit

You will see the following output. The source code for the EchoKit firmware is available on GitHub.

I (2862) phy_init: Saving new calibration data due to checksum failure or outdated calibration data, mode(2)
I (2879) esp32_nimble::ble_device: BLE Host Task Started
I (2882) NimBLE: GAP procedure initiated: stop advertising.

I (2884) esp32_nimble::ble_device: Device Address: 98:A3:16:F0:1C:1E
I (2887) NimBLE: GAP procedure initiated: advertise; 
I (2890) NimBLE: disc_mode=2
I (2892) NimBLE:  adv_channel_map=0 own_addr_type=0 adv_filter_policy=0 adv_itvl_min=0 adv_itvl_max=0
I (2901) NimBLE: 

I (2904) echokit: Free SPIRAM heap size: 5248788
I (2907) echokit: Free INTERNAL heap size: 81851
I (4541) esp_idf_hal::interrupt::asynch: IsrReactor "IsrReactor" started.

Once flashed, the EchoKit device will display a QR code on the screen and announce a “Welcome” message. You’re now ready for the next step.

Step 3: Start the EchoKit server

We already have pre-set servers ready to use. If you want to quick start, you can skip this part and go to step 4 to connect the server and device.

Instead of using the pre-set server, you can run the server in your own computer. Again the EchoKit server is written in Rust, please make sure you have installed Rust.

git clone https://github.com/second-state/echokit_server.git
cd echokit_server

Build the server in Rust.

cargo build --release

Next, go to the config.toml file to edit your AI pipeline.

# the server port and welcome voice
addr = "0.0.0.0:8080"
hello_wav = "hello.wav"

# the ASR model supports whisper model
[asr]
url = "https://api.groq.com/openai/v1/audio/transcriptions"
lang = "en"
api_key = "gsk_xxx"
model = "whisper-large-v3-turbo"

# supports any LLM model that compatible with OpenAI spec
[llm]
llm_chat_url = "https://api.groq.com/openai/v1/chat/completions"
api_key = "gsk_xxx"
model = "llama-3.3-70b-versatile"
history = 1

# supports GPT-SOVITs, 11Labs and Groq
[tts]
platform = "Groq"
api_key = "gsk_xxx"
model = "playai-tts"
voice = "Aaliyah-PlayAI"

## supports HTTP-Streamble and SSE MCP servers
[[llm.mcp_server]]
server = "http://localhost:8000/mcp"
type = "http_streamable"

## Set up the prompt
[[llm.sys_prompts]]
role = "system"
content = """
# input your prompt here.
"""

Here I use Groq as an example, because it's very fast. For our use case, I think you even don't need to pay for this.

If you want to add actions via MCP server, I would recommend you to use a close-sourced model like OpenAI.

Then, we can run the server.

# Enable debug logging
export RUST_LOG=debug

# Run the EchoKit server in the background
target/release/echokit_server

You will see the output logs as below:

[2025-10-15T09:37:13Z INFO echokit_server] Hello WAV: hello.wav

Step 4: Connect the EchoKit server and device

Now that the server is running, it’s time to connect it to your EchoKit device.

1. Open https://echokit.dev/setup/ in your browser. Ensure your browser supports Bluetooth.

2. Click “Connect to EchoKit” and pair the device with your server.

3. Enter the following information in the new page:

Wi-Fi Name and Password (2.4GHz network required)
Server URL: The IP and port of your running EchoKit server, for example: ws://192.168.1.56:8080/ws. If you don't run your own EchoKit server, you can use ws://indie.ehcokit.dev/ws, which is provided by the EchoKit project.

Press the K0 button on the left top of the EchoKit device to apply the settings.

Once connected, the EchoKit will display status updates like “Connecting to Wi-Fi” and “Connecting to Server.” When the connection is successful, you’ll hear a welcome voice and see a “Hello Set” message on the LCD.

Step 5: Talk with the EchoKit

You’re now ready to interact with your EchoKit device!

Press the K0 button to start voice input.
Once you see “Listening” on the screen, speak to the device.
The EchoKit will process your speech using ASR, send the text to the LLM for a response, and then use TTS to speak the reply back to you.

What's next?

Once you’ve built the base system, try expanding it:

Add MCP servers to trigger smart actions.
Integrate IoT control — lights, sensors, motors, etc.
Experiment with different TTS or LLM providers.
Create custom prompts for personality or domain-specific behavior.

You can find full documentation and source code at:👉 https://echokit.dev/

Code

Credits

Vivian Hu

1 project • 0 followers

Create Your Own AI Voice Agent Using ESP32 and Rust

Things used in this project

Hardware components

Software apps and online services

Hand tools and fabrication machines

Story

System Overview

Step 1: Assemble the hardware

Step 2: Flash the firmware

Step 3: Start the EchoKit server

Step 4: Connect the EchoKit server and device

Step 5: Talk with the EchoKit

What's next?

Schematics

Schematic diagram

Code

EchoKit Server

EchoKit Firmware

Credits

Vivian Hu

Comments

Embed the widget on your own site

Create Your Own AI Voice Agent Using ESP32 and Rust

Create Your Own AI Voice Agent Using ESP32 and Rust

Things used in this project

Hardware components

Software apps and online services

Hand tools and fabrication machines

Story

System Overview

Step 1: Assemble the hardware

Step 2: Flash the firmware

Step 3: Start the EchoKit server

Step 4: Connect the EchoKit server and device

Step 5: Talk with the EchoKit

What's next?

Schematics

Schematic diagram

Code

EchoKit Server

EchoKit Firmware

Credits

Vivian Hu

Comments

Related channels and tags