🤖 Your Free DIY AI Voice Assistant (ESP32-S3 & HuggingFace
⚙️ Hardware Requirements
☁️ Server Setup (HuggingFace Space
💻 Arduino Firmware Setup
▶️ How to Use

Published October 5, 2025 © GPL3+

ESP32 Ai Voice Assistant

In this project I will share with you Esp32S3 based Ai voice assistant

IntermediateFull instructions provided1 hour825

Things used in this project

Hardware components

Seeed Studio XIAO ESP32S3 Plus

SparkFun MEMS Microphone Breakout - INMP401 (ADMP401)

Adafruit max98357a

Tactile Switch, Top Actuated

DFRobot st7789

Software apps and online services

Arduino IDE

Story

🤖 Your Free DIY AI Voice Assistant (ESP32-S3 & HuggingFace)

This project guides you through building your very own AI Voice Assistant using completely free tools and AI models. The ESP32-S3 development board handles voice recording and audio playback, while all the complex AI processing runs on a HuggingFace Space server.

Don't forget to Subscribe to the Channel

Video Tutorial :

✨ Features

100% Free: No paid APIs, services, or subscriptions are required.
HuggingFace Integration: Utilizes a custom server setup on HuggingFace to combine Speech-to-Text (STT), a Large Language Model (LLM), and Text-to-Speech (TTS).
Hardware: Built around the powerful ESP32-S3 development board.
Current Language: Currently supports English only. (Multi-language support is a future goal).

⚙️ Hardware Requirements

The use of PSRAM is critical for voice recording and processing tasks, so make sure your board has it!

Component

Details

Note

Development Board

ESP32-S3 (16MB Flash, 8MB PSRAM)

⚠️ PSRAM is mandatory for the code to function correctly.

Display

ST7789 TFT Display

Check the separate display tutorial link in the video description for configuration.

Microphone

INMP441 I2S MEMS Microphone

Crucial: Add a capacitor between Ground and VCC on the microphone.

Audio Amp

MAX98357A I2S Audio Amplifier

Connected to a half-watt, 8-ohm speaker.

Trigger

Tactile Button

Used to start and stop the voice recording.

Wiring

Separate I2S Lines

Dedicated I2S GPIO pins are used for the microphone and the amplifier to simplify control and avoid noise.

☁️ Server Setup (HuggingFace Space)

The AI magic happens on a free HuggingFace Space.

Create a HuggingFace Account and go to "New Space".

Space Configuration:

Space SDK: Choose Docker.
Template: Select Blank.
Space Hardware: Use CPU Basic.
Space Configuration:Space SDK: Choose Docker.Template: Select Blank.Space Hardware: Use CPU Basic.

Upload Server Files: Copy the following files from this repository and create them in your new HuggingFace Space (Files tab -> Contribute -> Create new file):

Dockerfile
app.py
docker-compose.yml
requirements.txt
Upload Server Files: Copy the following files from this repository and create them in your new HuggingFace Space (Files tab -> Contribute -> Create new file):Dockerfile app.py docker-compose.yml requirements.txt
Create an Access Token: Go to your profile's Access Tokens, create a "Write" token, and copy it.

Set a Secret: In your Space's Settings tab, scroll down to Secrets and create a new secret:

Name (Must be exact):HF_TOKEN
Value: Paste your copied token here.
Set a Secret: In your Space's Settings tab, scroll down to Secrets and create a new secret:Name (Must be exact):HF_TOKENValue: Paste your copied token here.
Wait for the server state to go from "Building" to "Running". Check the Logs if any error occurs.

💻 Arduino Firmware Setup

Install Libraries: Download and install all required libraries (links in the video description). Note: The ESP8266 audio library works fine with ESP32.

Code Configuration:

Enter your WiFi SSID and Password.
Update the Server URL with your specific HuggingFace Space URL (Pay attention to case sensitivity).
Code Configuration:Enter your WiFi SSID and Password.Update the Server URL with your specific HuggingFace Space URL (Pay attention to case sensitivity).

ESP32 Board Settings:

Ensure PSRAM is enabled in the Tools menu.
Select a Partition Scheme that includes SPIFFS.
ESP32 Board Settings:Ensure PSRAM is enabled in the Tools menu.Select a Partition Scheme that includes SPIFFS.
Upload the code to your ESP32-S3 board.

▶️ How to Use

Once the display shows "Assistant Ready", the system is operational.
Press and hold the tactile button to start recording your voice into PSRAM.
Release the button to stop recording and automatically send the audio to the server.
The AI server processes the request (STT -> LLM -> TTS), and sends the response audio back to the ESP32.
The ESP32 downloads the audio to the LittleFS and automatically plays the answer on the speaker.

Enjoy building your AI Assistant! See you in next project