I modified the Xiaozhi firmware for the Seeed Xiao ESP32-S3 and replaced the default emoji UI with a fully custom animated face.
The project uses an I2S microphone for input, a MAX98357A I2S amplifier for speaker output, and a 128x64 OLED display driven by LVGL. The firmware was updated to remove subtitles and icon-based emotions, and instead render animated eyes and a mouth that react to device state.
The face includes idle movement, random blinking, a listening expression with asymmetric eyes, and a speaking animation where the mouth opens and closes dynamically. All animations are non-blocking and handled with an LVGL timer, so audio and network performance remain unaffected.
The entire build runs on the ESP32-S3 without external processors, making it a compact AI desk companion built entirely on a breadboard. But it does rely on the Xiaozhi server.
OLED Connections- SDA β GPIO5 (D4)
- SCL β GPIO6 (D5)
- VCC β 3.3V / 5V
- GND β GND
- SCK / BCLK β GPIO44 (D7)
- WS / LRCK β GPIO9 (D10)
- SD / DOUT β GPIO1 (D0)
- VCC β 3.3V
- GND β GND
- BCLK β GPIO7 (D8)
- LRC / LRCK β GPIO4 (D3)
- DIN β GPIO2 (D1)
- VIN β 5V (recommended)
- GND β GND











Comments