Thanks to kodeOS, you have two ways to try this:
- Install the ready-made app on your Kode Dot — no setup, just grab and go.
- Download the PlatformIO source and play with it: tweak prompts, add actions, or extend the UI.
Press the front button (or touch), speak naturally, and watch the device think and then act: turn on an LED, sweep a servo, or trigger modules on the top 5 V connector. The AMOLED shows a friendly reply while, in parallel, a non-blocking state machine executes a sequence of GPIO/servo commands.
This project was built using the pre-production Kode Dot prototype, currently featured in our Kickstarter campaign.
How it works (deep dive)- Audio in: the ESP32-S3 captures mono PCM at 32 kHz/16-bit (
AudioManager), up to 30 s. - Model call: a FreeRTOS task (
openaiTask) sends the audio to OpenAI (model:gpt-4o-audio-preview) through a simple HTTP client (BasicGPTClient). - Structured reply: the model is forced via a system prompt to return two lanes:
Response:short, human-friendly text for the screen.Actions:a compact control block for the executor (machine-readable).- Parallelism: the UI thread prints
Responsewith a typewriter effect while the executor consumesActionswithout blocking the UI (state =EXECUTING/WAITINGwith timestamps). - 5 V power: the BQ25896 PMIC is initialized in OTG/Boost mode to feed the top connector with 5 V for servos and peripherals.
- Pins: user-safe GPIO set
{1,2,3,11,12,13,39,40,41,42}. - Memory: short conversation history (up to 6 messages) keeps the chat natural without blowing RAM.
The system prompt enforces a strict split between what the user reads and what the device executes. Your code expects exactly:
Response: <short friendly sentence for screen>
Actions: BEGIN; GPIO(<pin>,ON|OFF); SERVO(<pin>,<0..180>); DELAY(<ms>); ENDExamples the model sees:
“Turn on pin 11”
Response: Sure! I've activated pin 11 for you.
Actions: BEGIN;GPIO(11,ON);END“Sweep servo on pin 3”
Response: Watch this! I'll sweep the servo for you.
Actions: BEGIN;SERVO(3,0);DELAY(500);SERVO(3,90);DELAY(500);SERVO(3,180);ENDThis simple grammar makes parsing trivial on-device and keeps the LLM free to be conversational without risking unsafe free-form code.
Parser → State machine → Drivers- Parser:
parseGPTResponse()extracts theBEGIN … ENDblock and the visibleResponse:text, tolerating formatting noise. - Sequence builder:
parseActionSequence()tokenizesGPIO(),SERVO(), andDELAY()into acommands[]vector. - Executor: a non-blocking loop advances the sequence:
GPIO_SET→pinMode(pin, OUTPUT)+digitalWriteSERVO_SET→ lazyattach()+Servo.write(angle)DELAY→ switch toWAITINGuntilmillis() >= waitUntil- Parallel UI: while the executor runs,
UIManagerkeeps animating the reply and status. Nodelay()locks the screen.
Tiny pseudocode
Plan p = parseJsonLike(raw);
ui.showText(p.response); // visible immediately
executor.load(p.actions); // runs in parallel
loop() {
ui.update(); // LVGL / gfx
executor.tick(millis()); // one small step if due
}Hardware & wiring- Kode Dot (ESP32-S3) with AMOLED touch, mic, speaker.
- Top 5 V bus: enabled by
initPMIC()→pmic.setOTG_CONFIG(true). - Demo setup:
- LED on GPIO 12 (active HIGH).
- Micro-servo on GPIO 1 (signal), powered from the top 5 V connector.
- Button/Touch: either input starts/stops recording; simple debounce.
The device connects to Wi-Fi in STA mode with auto-reconnect enabled and optimized PHY settings for reliability. Each request to the GPT API is bounded by a 20-second timeout to prevent blocking. If parsing fails or no valid audio is detected, the user interface immediately displays a clear error message and resets to the Ready state. Text shown on screen is pre-processed to normalize curly quotes, dashes, and ellipses for the built-in font, ensuring every response renders cleanly without crashes or encoding artifacts.
Why this approach?LLMs are great at language, but hardware needs determinism. Here we force a tiny action grammar, keep the human-readable reply completely separate from the machine plan, and execute that plan with a non-blocking state machine. The result is reliable behavior under embedded constraints—no UI freezes, no ambiguous parsing—while preserving the “talk to your device” magic.








Comments