Voice AI assistants are everywhere today, but most are based on proprietary models and cloud services and offer little opportunity to explore how they actually work. With EchoKit, you can build your own local voice AI assistant on an ESP32 board — fully open-source, educational, and customizable.
This project is designed for makers, students, educators, and AI enthusiasts who want hands-on experience with modern AI technologies. EchoKit integrates end to end model, speech-to-text, large language models, and text-to-speech into a compact device, giving you the chance to experiment safely and learn how these systems communicate in real time.
By building and customizing your own EchoKit, you’ll not only get an interactive voice assistant, but also a deeper understanding of AI pipelines, firmware development, and device integration. It’s perfect for classroom demonstrations, makerspace projects, or personal experiments in AI.
System OverviewThe diagrams below illustrate the overall architecture — from speech input to AI response.
For the voice interaction, we support the ASR-LLM-TTS pipeline (classic modular approach) and the end-to-end pipeline. In my view, the ASR-LLM-TTS pipeline allows you to customize more, like adding MCP server and knowledge base. Click here to learn more differences between the ASR-LLM-TTS pipeline and the end-to-end pipeline.
Next, let's build a voice AI agent together.
Step 1: Assemble the hardwareThe EchoKit hardware is made up of several components
- ESP32-S3 development board
- Extension board with audio and microphone modules
- Mini speaker
- 1.54” LCD screen
Let's assemble the hardware together.
- Connect the mini speaker to the audio module in the middle of the extension board.
- Mount the ESP32-S3 board onto the extension board.
- Insert the LCD screen into the top slot of the extension board.
Once assembled, you should have a fully functional EchoKit hardware setup.
Step 2: Flash the firmwareNow that the hardware is assembled, it’s time to flash the firmware onto the device. This will enable the EchoKit to communicate with the server and AI models.
1. Connect the EchoKit device to your computer using the USB-C cable.
2. Use espflash command-line tool to flash the firmware.
Since the EchoKit firmware is written in Rust, you will need to install Rust toolchains and the espflash and its dependencies.
cargo install cargo-espflash espflash ldproxyGet the latest firmware.
curl -L -o echokit https://echokit.dev/firmware/echokitFlash the firmware to the EchoKit device.
espflash flash --monitor --flash-size 16mb echokitYou will see the following output. The source code for the EchoKit firmware is available on GitHub.
I (2862) phy_init: Saving new calibration data due to checksum failure or outdated calibration data, mode(2)
I (2879) esp32_nimble::ble_device: BLE Host Task Started
I (2882) NimBLE: GAP procedure initiated: stop advertising.
I (2884) esp32_nimble::ble_device: Device Address: 98:A3:16:F0:1C:1E
I (2887) NimBLE: GAP procedure initiated: advertise;
I (2890) NimBLE: disc_mode=2
I (2892) NimBLE: adv_channel_map=0 own_addr_type=0 adv_filter_policy=0 adv_itvl_min=0 adv_itvl_max=0
I (2901) NimBLE:
I (2904) echokit: Free SPIRAM heap size: 5248788
I (2907) echokit: Free INTERNAL heap size: 81851
I (4541) esp_idf_hal::interrupt::asynch: IsrReactor "IsrReactor" started.Once flashed, the EchoKit device will display a QR code on the screen and announce a “Welcome” message. You’re now ready for the next step.
We already have pre-set servers ready to use. If you want to quick start, you can skip this part and go to step 4 to connect the server and device.
Instead of using the pre-set server, you can run the server in your own computer. Again the EchoKit server is written in Rust, please make sure you have installed Rust.
git clone https://github.com/second-state/echokit_server.git
cd echokit_serverBuild the server in Rust.
cargo build --releaseNext, go to the config.toml file to edit your AI pipeline.
# the server port and welcome voice
addr = "0.0.0.0:8080"
hello_wav = "hello.wav"
# the ASR model supports whisper model
[asr]
url = "https://api.groq.com/openai/v1/audio/transcriptions"
lang = "en"
api_key = "gsk_xxx"
model = "whisper-large-v3-turbo"
# supports any LLM model that compatible with OpenAI spec
[llm]
llm_chat_url = "https://api.groq.com/openai/v1/chat/completions"
api_key = "gsk_xxx"
model = "llama-3.3-70b-versatile"
history = 1
# supports GPT-SOVITs, 11Labs and Groq
[tts]
platform = "Groq"
api_key = "gsk_xxx"
model = "playai-tts"
voice = "Aaliyah-PlayAI"
## supports HTTP-Streamble and SSE MCP servers
[[llm.mcp_server]]
server = "http://localhost:8000/mcp"
type = "http_streamable"
## Set up the prompt
[[llm.sys_prompts]]
role = "system"
content = """
# input your prompt here.
"""Here I use Groq as an example, because it's very fast. For our use case, I think you even don't need to pay for this.
If you want to add actions via MCP server, I would recommend you to use a close-sourced model like OpenAI.
Then, we can run the server.
# Enable debug logging
export RUST_LOG=debug
# Run the EchoKit server in the background
target/release/echokit_serverYou will see the output logs as below:
[2025-10-15T09:37:13Z INFO echokit_server] Hello WAV: hello.wavStep 4: Connect the EchoKit server and deviceNow that the server is running, it’s time to connect it to your EchoKit device.
1. Open https://echokit.dev/setup/ in your browser. Ensure your browser supports Bluetooth.
2. Click “Connect to EchoKit” and pair the device with your server.
3. Enter the following information in the new page:
- Wi-Fi Name and Password (2.4GHz network required)
- Server URL: The IP and port of your running EchoKit server, for example: ws://192.168.1.56:8080/ws. If you don't run your own EchoKit server, you can use ws://indie.ehcokit.dev/ws, which is provided by the EchoKit project.
Press the K0 button on the left top of the EchoKit device to apply the settings.
Once connected, the EchoKit will display status updates like “Connecting to Wi-Fi” and “Connecting to Server.” When the connection is successful, you’ll hear a welcome voice and see a “Hello Set” message on the LCD.
Step 5: Talk with the EchoKitYou’re now ready to interact with your EchoKit device!
- Press the K0 button to start voice input.
- Once you see “Listening” on the screen, speak to the device.
- The EchoKit will process your speech using ASR, send the text to the LLM for a response, and then use TTS to speak the reply back to you.
Once you’ve built the base system, try expanding it:
- Add MCP servers to trigger smart actions.
- Integrate IoT control — lights, sensors, motors, etc.
- Experiment with different TTS or LLM providers.
- Create custom prompts for personality or domain-specific behavior.
You can find full documentation and source code at:👉 https://echokit.dev/










Comments