I modified the Xiaozhi firmware for the Seeed Xiao ESP32-S3 and replaced the default emoji UI with a fully custom animated face.
The project uses an I2S microphone for input, a MAX98357A I2S amplifier for speaker output, and a 128x64 OLED display driven by LVGL. The firmware was updated to remove subtitles and icon-based emotions, and instead render animated eyes and a mouth that react to device state.
The face includes idle movement, random blinking, a listening expression with asymmetric eyes, and a speaking animation where the mouth opens and closes dynamically. All animations are non-blocking and handled with an LVGL timer, so audio and network performance remain unaffected.
The entire build runs on the ESP32-S3 without external processors, making it a compact AI desk companion built entirely on a breadboard. But it does rely on the Xiaozhi server.
OLED Connections- SDA → GPIO5 (D4)
- SCL → GPIO6 (D5)
- VCC → 3.3V / 5V
- GND → GND
- SCK / BCLK → GPIO44 (D7)
- WS / LRCK → GPIO9 (D10)
- SD / DOUT → GPIO1 (D0)
- VCC → 3.3V
- GND → GND
- BCLK → GPIO7 (D8)
- LRC / LRCK → GPIO4 (D3)
- DIN → GPIO2 (D1)
- VIN → 5V (recommended)
- GND → GND











Comments