Published October 5, 2025 © GPL3+

ESP32 Voice Assistant: Gemini AI & I2S Audio v0.2

ESP32-S3 V0.2: Real Voice Assistant! Upgraded to 16MB ESP32-S3 and INMP441 mic. Finally, real voice input, not button prompts!

IntermediateFull instructions provided2 hours200

ESP32 Voice Assistant: Gemini AI & I2S Audio v0.2

Things used in this project

Hardware components

esp32 s3 n16r8

inmp441 microphone

max98357a

Breadboard (generic)

Jumper wires (generic)

ElectroPeak 0.96" OLED 64x128 Display Module

C&K Switches PTS 645 Series Switch

Story

Closing the Gap

Our V0.1 project successfully demonstrated a token-efficient, privacy-conscious AI assistant that used predefined, button-selected prompts (the "No-Mic" approach). It was a great concept, but the limitations were clear: the bot couldn't respond to anything new.

V0.2 shatters that limitation. By upgrading our hardware, we can now capture and process live, unscripted user voice, turning it into a true, conversational AI device. This migration guide focuses on the critical hardware and software changes required for this massive leap.

How V0.2 Works: The New Architecture

The client-server architecture remains, but the User Input step is fundamentally changed:

User Input (NEW): The user presses Button 1 to start recording. The INMP441 Microphone captures the audio stream.
Recording Control: The user can press Button 2 to manually stop recording, or it will stop automatically after 6 seconds. The inbuilt RGB LED flashes to indicate the listening status.
Client Processing: The ESP32-S3, utilizing its expanded memory, processes and prepares the captured audio data.
Server Processing: The audio is sent to speech-to-text service (Gemini 2.5 Flash-Lite), transcribed text is then sent to ai service (again Gemini 2.5 Flash-Lite) for generating response. AI response is then send to gTTS for performing TTS action and finally it is streamed back to the client.
Audio Playback: The server streams the audio back to the ESP32. The ESP32 utilizes its I2S peripheral to play this stream through the MAX98357A amplifier and speaker.
Status Updates: The OLED display shows "Thinking..." during server communication and "Speaking..." during audio playback, with corresponding LED indicators.

how it was made

Hardware Components

Microcontroller: ESP32 S3 N16R8
Display: 0.96" OLED Display (SSD1306)
Audio Output: MAX98357A I2S Class-D Amplifier + 8-ohm speaker
Audio Input: INMP441 I2S Microphone
User Input: 2x Tactile Buttons
Visual Cues: Onboard RGB LED
Miscellaneous: Breadboard, jumper wires, 1A USB power supply

Wiring Diagram & Pinout

Wiring

pinout

power rail config

Software Setup: Focus on the Voice Stack

The core of the software change is the Speech-to-Text (STT) integration on the ESP32-S3 and the management of audio streaming via the new microphone.

1. Python Server Setup

Ensure Python 3 is installed.
Create and activate a virtual environment.
Install dependencies: pip install -r requirements.txt
Create a.env file with GEMINI_API_KEY="YOUR_API_KEY_HERE".
Run the server: python server.py.

2. ESP32 Firmware Setup (Platform IO)Instead of the Arduino IDE, we're using PlatformIO (PIO), which is a powerful, professional tool built on top of VS Code. PlatformIO simplifies dependency management, board configuration, and building for the ESP32-S3.

Install Visual Studio Code (VS Code), if you haven't already.
Open VS Code and navigate to the Extensions view (Ctrl+Shift+X or Cmd+Shift+X).
Search for and install the PlatformIO IDE extension.
Download or clone the V0.2 firmware project from the repository (link shared below)
Open the project folder in VS Code. PlatformIO will automatically recognize the project structure.
Ensure the platformio.ini file specifies the correct board and settings for the increased memory (link shared below)
Navigate to the src folder and open the main firmware file (main.cpp).
Update the Python server URL (SERVER_IP) where your code will send the audio data/request.
Click the PlatformIO Build button (check mark icon ✅) in the status bar at the bottom of VS Code.
Connect your ESP32-S3-N16R8 board to your PC.
Click the PlatformIO Upload button (right arrow icon ➡️) in the status bar.

Get the full firmware and server code for V0.2 here: [Link to GitHub Repository/Readme]

This V0.2 project moves beyond the limitations of its predecessor and provides a robust, real-world example of building a high-performance, voice-enabled AI device on an accessible microcontroller platform. Happy building!

Code

Credits

CircuitSmiles

2 projects • 2 followers

ESP32 Voice Assistant: Gemini AI & I2S Audio v0.2

Things used in this project

Hardware components

Story

Closing the Gap

How V0.2 Works: The New Architecture

Hardware Components

Wiring Diagram & Pinout

Software Setup: Focus on the Voice Stack

Schematics

Circuit diagram

Code

ai-chat-bot-v0.2

Credits

CircuitSmiles

Comments

Embed the widget on your own site

ESP32 Voice Assistant: Gemini AI & I2S Audio v0.2

ESP32 Voice Assistant: Gemini AI & I2S Audio v0.2

Things used in this project

Hardware components

Story

Closing the Gap

How V0.2 Works: The New Architecture

Hardware Components

Wiring Diagram & Pinout

Software Setup: Focus on the Voice Stack

Schematics

Circuit diagram

Code

ai-chat-bot-v0.2

Credits

CircuitSmiles

Comments

Related channels and tags