Published April 8, 2026 © GPL3+

Pocket Size ESP32 AI Voice Assistant With Xiaozhi

Pocket-size AI voice assistant that can listen, respond, and even display information on a small screen. It feels more like a real device

BeginnerProtip1 hour11,730

Pocket Size ESP32 AI Voice Assistant With Xiaozhi

Things used in this project

Hardware components

ESP32-S3 development board

INMP441 microphone

MAX98357A amplifier

ElectroPeak 0.96" OLED 64x128 Display Module

NextPCB Custom PCB Board

Software apps and online services

Arduino IDE

Xiaozhi

Espressif ESP-IDF

Hand tools and fabrication machines

Soldering iron (generic)

Story

I’ve always been interested in voice assistants, but most of the time they are stuck inside phones or big smart speakers. I wanted something smaller… something I could actually keep on my desk or carry around with me.

So I decided to build my own using an ESP32. At first it felt a bit complicated, but after discovering Xiaozhi firmware, things became much easier. I didn’t have to deal with heavy coding — just flash the firmware and set it up.

The result is a pocket-size AI voice assistant that can listen, respond, and even display information on a small screen. It feels more like a real device than just a project.

What You Can Do With It

This little device can actually be used in many ways:

You can use it as a simple smart home assistant to check weather or control devices
It works as a learning companion for asking questions or quick explanations
You can use it for fun interactions like chatting, stories, or basic entertainment
It’s also great as a developmnt platform if you want to experiment with AI and voice projects
And it can act as an IoT interface inside bigger smart systems

Supplies

I didn’t build this in one go.

1 / 4

First I tested everything on a breadboard, then I made a compact PCB version.

Breadboard Version

For testing, I used only basic parts:

ESP32-S3 development board (Amazon.com / Aliexpress.com)
INMP441 microphone (Amazon.com/ Aliexpress.com)
MAX98357A amplifier (Amazon.com/ Aliexpress.com)
Small speaker (4Ω–8Ω, 2–3W) (Amazon.com)
OLED display (SSD1306, 0.91” / 0.96”) (Amazon.com)
Push button
Breadboard + jumper wires

This was enough to test voice input, AI response, and audio output.

PCB Version

After it worked, I moved to a PCB to make it small and clean.

Components I used on PCB:

ESP32-S3 (main controller)
INMP441 digital microphone
MAX98357A I2S amplifier
OLED 0.96” (SSD1306 – HS96L03W2C03)

Buttons & Switch:

Reset button (K2-1109DE)
3 tactile buttons (Boot, Vol+, Vol− – KH-4.5X4.5X6H)
Power switch (SK12D07L3B)

Connectors:

2.54mm female headers (I2C / OLED / ESP32)
2x22 pin headers for ESP32 mounting
Speaker connector (1.25mm 2-pin – ZX-MX1.25)

This made the whole project much more compact and practical. Now it actually feels like a proper pocket device instead of a prototype.

How It Works

Before building, it helps to understand what’s actually going on behind the scenes.

This project doesn’t run AI directly on the ESP32. Instead, it uses cloud-based processing, which is why even a small board can act like a smart assistant.

The working flow is like this:

First, you speak into the microphone.

The ESP32 records your voice using I2S audio input.

Then it sends that audio over WiFi to the Xiaozhi cloud.

On the cloud side, all the heavy work happens:

Your voice is converted into text (Speech-to-Text)
The AI model understands it and generates a reply
That reply is converted back into voice (Text-to-Speech)

After that, the audio response is sent back to the ESP32.

Finally, the ESP32 plays it through the speaker.

So in simple terms, the ESP32 is handling:

Recording and playing audio
Connecting to WiFi
Managing the device

And the actual AI thinking happens in the cloud.

That’s why even a small microcontroller like ESP32 can run a powerful voice assistant without needing heavy processing on the device itself.

Step 2: Understanding the Hardware

1 / 2

Before starting the wiring, I spent some time understanding what each part actually does. It makes things much easier later.

The INMP441 microphone is a digital mic that uses I2S. Instead of giving analog signals, it sends digital audio directly to the ESP32. Because of that, it has less noise and gives better audio quality.

The MAX98357A amplifier also works on I2S. It takes digital audio from the ESP32 and directly drives the speaker. The good part is that it doesn’t need any extra DAC, so the circuit stays simple.

The ESP32-S3 is the main controller of the whole project. It handles WiFi, records audio from the mic, sends it to the cloud, and plays back the response.

The OLED display is just for feedback. It shows things like status, messages, or connection info. Not required, but it makes the project feel more complete.

One interesting thing here is that both input and output use I2S.

So the same type of audio interface is used for the microphone and the speaker, which makes it perfect for this kind of voice project.

Step 3: ESP32-S3 Development Board

The ESP32-S3-DevKitC-1 is the main controller used in this project. It’s a powerful and flexible development board based on Espressif’s ESP32-S3 chip.

It comes with built-in WiFi and Bluetooth, which makes it perfect for IoT and AI-based projects. It has enough performance to handle things like audio processing and communication with cloud services.

At its core, it uses a dual-core Xtensa LX7 processor running up to 240 MHz. It also supports basic hardware acceleration for AI tasks, which helps in applications like voice interaction.

Specification:-

Microcontroller: ESP32-S3
CPU: Dual-core Xtensa LX7 Running at up to 240 MHz
USB Ports: Dual Type-C For Data and Power
Wireless Connectivity: Wi-Fi (802.11b/g/n), Bluetooth (BLE)
Flash : 8 MB QD
SPI Voltage : 3.3 V
Module Integrated : ESP32-S3-WROOM-1-N16
Frequency: 2.4 GHz
Interface Type: GPIO, UART, USB
Protocol Supported: 802.11 b/g/n
Connectivity Technology: Bluetooth, Wi-Fi, USB
GPIO Pins: 34

Step 4: MAX98357A I2S Audio Amplifier

The MAX98357A is a small but powerful digital audio amplifier used to drive the speaker in this project.

Instead of using analog audio, it works directly with I2S digital audio, which makes the circuit much simpler and cleaner. You don’t need a separate DAC because this module already converts digital audio into sound and amplifies it at the same time.

That’s why it’s commonly used in smart speakers, portable audio devices, and voice projects like this.

How It Works (Simple Idea)

The ESP32 sends digital audio data through I2S.

The MAX98357A:

Converts that digital signal into analog audio
Amplifies it
Sends it directly to the speaker

So basically, it combines DAC + Amplifier in one module.

Key Technical Details

Supply Voltage (V_DD): 2.5V to 5.5V
Output Power: 3.2W into 4Ω at 5V
Digital Audio Interface: I²S
THD+N: 0.1% (Typical at 1kHz)
SNR: 90dB (Typical)
Efficiency: Up to 92%

NMP441 - ->ESP32

VIN → 3.3V
GND → GND
DIN → GPIO7 (audio data)
BCLK → GPI015
LRC → GPI016
GAIN→ GND
SD → 3.3V

Speaker connects directly to:

SPK+
SPK−

Step 5: NMP441 Digital Microphone (I2S)

The INMP441 is the microphone used in this project to capture your voice.

It’s a digital MEMS microphone, which means it already has built-in amplification and analog-to-digital conversion. So instead of sending noisy analog signals, it directly sends clean digital audio to the ESP32 using I2S.

That’s why it works really well for voice projects like this.

How It Works

When you speak, the microphone captures the sound and converts it into digital data.

This data is sent to the ESP32 through the I2S interface using three main signals:

SCK → clock
WS → left/right sync
SD → audio data

So no extra ADC or complex circuit is needed.

Why I Used This Mic

Digital output → less noise compared to analog mics
Works directly with ESP32 (I2S)
Small and easy to use
Good sensitivity for voice input
Low power consumption

Here’s how I connected it With esp32

NMP441 - ->ESP32

VDD → 3.3V
GND → GND
WS → GPIO4 (word select)
SCK → GPIO5 (clock)
SD → GPIO6 (data out)
L/R → GND (for mono / left channel)

Important Notes

Works on 3.3V only
L/R pin decides left or right channel (I used GND for simple setup)
It’s a mono microphone
Try to keep wiring short for better audio quality

Step 6:. 128x32 OLED Display

For displaying information, I used a small OLED screen based on the SSD1306 driver.

These displays are very common in DIY projects because they are small, clear, and easy to use. Even though the size is tiny, the text looks sharp and easy to read.

It connects using the I2C interface, so only two data lines are needed, which makes wiring simple.

High contrast → text is very clear
Low power → good for portable projects
Small size → perfect for compact builds
Easy to use → lots of libraries available

The connections should be as follows:

ESP32 - ->OLED

3.3V --->VCC

GND -->GND

GPIO41----> SDA

GPIO42----> SCL

Step 7: Make the Circuit

Now just follow the circuit diagram shown in the image above.

I’ve already explained all the connections for each component, so you can wire them step by step. To make things easier, I also built it on a breadboard.

1 / 3

Here you can see the full setup in the photo. Just connect everything carefully and double-check the pins before powering it on.

Flashing the Xiaozhi Firmware

1 / 11

Now let’s upload the Xiaozhi firmware to the ESP32.

1. Download Files

First, download these:

ESP Flash Download Tool
https://docs.espressif.com/projects/esp-test-tools/en/latest/esp32/production_stage/tools/flash_download_tool.html
My Project Repository (Firmware + Files)
https://github.com/Rau7han/esp32-ai-pocket-assistant

This repo has everything you need, including the firmware file.

2. Setup Flash Tool

Extract all the downloaded files
Open the ESP Flash Download Tool
Select chip type → ESP32-S3

3. Flash the Firmware

Now comes the main part.

Click on the “…” button and select the firmware file
(merged-binary.bin) from the project folder
Set address to: 0x0
Tick the checkbox next to the file
Select the correct COM port

Then:

Click Erase and wait until it finishes
Click Start

Wait a few minutes until it shows FINISH.

WiFi Setup and Device Activation

After flashing the firmware, now it’s time to connect the device and make it ready.

1 / 10

Power On and Connect

Turn on your device.

After booting, it will create its own WiFi hotspot like:

Xiaozhi-XXXX

Open WiFi settings on your phone or laptop and connect to it.

Open Setup Page

Once connected, open your browser and go to:

192.168.4.1

This will open the configuration page.

Connect to Your WiFi

Select your WiFi network (2.4GHz only)
Enter password
Click Connect

If everything is correct, the device will connect and restart automatically.

Get Device Code

After restarting, the device will speak (or show) a 6-digit code.

This code is needed to link your device.

Add Device to Xiaozhi

Now go to:

👉 http://xiaozhi.me

Create an account or log in
Open Console
Click Add Device
Enter the 6-digit code

Now your device will appear in your dashboard.

Customize Your Assistant

You can now customize your AI:

Change name
Select language
Choose voice
Set personality (fun, professional, etc.)
Select AI model

Save the settings.

Testing on Breadboard

1 / 3

Now that everything is set up, it’s time to test it on the breadboard.

Restart the device once, and wait for it to connect to WiFi.

After that, try pressing the button (or use wake word if enabled) and speak something simple like:

“Hello” or “What can you do?”

If everything is connected properly:

The microphone will capture your voice
After a second, you should hear a reply from the speaker
The OLED may show status or messages

The first time I tested it, I was honestly not sure if it would work 😅

But when it replied back, that felt really satisfying.

If It Doesn’t Work

Don’t worry, just check a few things:

WiFi is connected properly
Wiring is correct (especially I2S pins)
Speaker and amplifier connections
Power supply is stable

Most issues usually come from wiring mistakes.

PCB Version

1 / 4

After testing everything on the breadboard, I realized the wiring was getting really messy and not practical for daily use.

So I decided to move to a PCB to make it cleaner, more stable, and compact.

I didn’t design this PCB myself. I found a really nice open-source design online, so full credit goes to the original creator. surferlong

I liked this design because it already had everything well organized — proper placement for the ESP32, microphone, amplifier, buttons, and display. It also made the whole setup small enough to actually carry around.

Instead of redesigning everything, I directly used this PCB and ordered it. It saved a lot of time and effort.

Once I got the board, I just assembled all the components on it, and the difference was huge compared to the breadboard version.

Everything became:

much cleaner
more reliable
and way more compact

Now it actually feels like a proper pocket device instead of a prototype.

PCB Fabrication – NextPCB

1 / 2

For this project, I ordered my PCB from NextPCB using the open-source Gerber files.

I went with the standard blue solder mask with white silkscreen. The boards arrived within about a week, and the quality was really good. The finish was clean, holes were accurate, and everything aligned perfectly, making the final build look much more professional compared to the breadboard setup.

NextPCB also provides a tool called HQDFM, which I found quite useful. It’s a free online Gerber viewer and DFM (Design for Manufacturing) analysis tool. You can upload your design and quickly check for possible issues before ordering.

I personally like using it because it avoids waiting for manufacturer feedback. You can do a quick self-check and fix mistakes early. The online viewer is fine for a basic preview, and for more detailed analysis (especially PCBA), there’s also a desktop version available.

They also have an accelerator program where you can get free assembled PCBs for selected projects:

https://www.nextpcb.com/blog/rp2040-free-pcba-prototypes-nextpcb-accelerator

Overall, my experience was smooth. Good PCB quality, fast delivery, and helpful tools like HQDFM made the whole process easier. If you’re building PCB-based projects, it’s worth checking them out.

PCB Assembly

1 / 8

For this step, you’ll need a basic soldering setup: a soldering iron, solder wire, a nipper, and optionally a multimeter for checking connections.

Start by soldering components in order of their height. This makes assembly easier and keeps everything aligned properly.

Step-by-Step Assembly

Solder the pin headers first
Insert the female headers into the PCB (as shown in the images) and solder them from the backside. These will hold the ESP32-S3 board in place.
Mount and solder the push buttons
Place the buttons in their respective positions on the PCB and solder all pins properly.
Add smaller components (like modules)
Solder modules like the microphone board carefully, making sure all pins are properly connected.
Attach the ESP32-S3 board
Once headers are in place, mount the ESP32-S3 board on top. Make sure alignment is correct before soldering (if required).
Solder the TP4056 charging module (Back Side)
Flip the PCB and solder the TP4056 module on the backside as shown in the images.
This handles battery charging, so ensure:
Correct polarity (+ / -)
Strong solder joints
Speaker and other connections
Connect the speaker wires to the PCB pads or connector provided.

Working

1 / 7

After assembling everything, the device powered up without any issues.

Since I had already flashed the firmware earlier, it started working immediately after turning it on. The ESP32-S3 boots up, initializes all modules, and the AI assistant becomes ready to use.

The microphone captures voice input, processes it, and the device responds through the speaker. The buttons also work for control, and overall the system feels smooth and responsive.

Everything came together nicely once moved from breadboard to PCB — much cleaner, compact, and reliable.

Conclusion

1 / 4

This project is a simple but powerful example of combining AI with embedded hardware.

It’s not just a circuit — it’s a small standalone AI device that can listen, respond, and interact in real time. And the best part is, it runs completely on an ESP32-S3 without needing any complex setup.

I didn’t design the PCB myself, but using an open-source design and building on top of it made the whole process much faster and easier.

For me, this project was more about learning and experimenting — understanding how voice, AI, and hardware can work together in a compact form.

It also shows that you don’t need expensive hardware to build something interesting. With the right components and open-source resources, you can create your own smart devices.

Overall, it turned out to be a clean, portable, and practical build.

Schematics

File missing, please reupload.

Code

Credits

Raushan kr.

39 projects • 170 followers

Maker | Developer | Content Creator

Embed the widget on your own site

Pocket Size ESP32 AI Voice Assistant With Xiaozhi

Pocket Size ESP32 AI Voice Assistant With Xiaozhi

Things used in this project

Hardware components

Software apps and online services

Hand tools and fabrication machines

Story

What You Can Do With It

Supplies

Breadboard Version

PCB Version

How It Works

Step 2: Understanding the Hardware

Step 3: ESP32-S3 Development Board

Specification:-

Step 4: MAX98357A I2S Audio Amplifier

How It Works (Simple Idea)

Key Technical Details

Step 5: NMP441 Digital Microphone (I2S)

How It Works

Why I Used This Mic

Important Notes

Step 6:. 128x32 OLED Display

Step 7: Make the Circuit

Flashing the Xiaozhi Firmware

2. Setup Flash Tool

3. Flash the Firmware

WiFi Setup and Device Activation

Open Setup Page

Connect to Your WiFi

Get Device Code

Add Device to Xiaozhi

Customize Your Assistant

Testing on Breadboard

If It Doesn’t Work

PCB Version

PCB Fabrication – NextPCB

PCB Assembly

Step-by-Step Assembly

Working

Conclusion

Schematics

wiring

schematic

Code

Source Files

Credits

Raushan kr.

Comments

Related channels and tags