•

Published June 11, 2026 © MIT

I Built a Voice-Activated Memo Board on a 10.3-Inch E-Paper

Press a button, speak your reminder, and an AI-powered 10.3-inch e-paper board displays it as a scheduled card no typing, no phone, no app

AdvancedFull instructions provided6 hours38

I Built a Voice-Activated Memo Board on a 10.3-Inch E-Paper

Things used in this project

Hardware components

Seeed Studio reTerminal E1003 (ESP32-S3)

Software apps and online services

Qoder

I used Qoder with the PlatformIO extension to flash the firmware, but any PlatformIO-compatible IDE (e.g., VS Code) can be used to replicate this project.

PlatformIO extension

Speech-to-text / LLM API

Story

The problem on my desk

A few months ago I was lucky enough to get my hands on a Seeed Studio reTerminal E1003 — an ESP32S3-based dev board with a 10.3-inch e-paper display running at 1872 by 1404 pixels in 16-level grayscale. It is a gorgeous panel. The moment I peeled the protective film off and powered it on, the first thing that struck me was how much it looked like a sheet of paper sitting on my desk. Real paper. The kind you pin notes to.

That got me thinking about a problem that had been quietly ruining my productivity for months.

My day job involves juggling requests from every direction — hardware teams, software teams, community partners, documentation deadlines, sample shipments, demo prep. Things pile up fast. I used to tell myself I would jot everything down in the phone memo app or the macOS Reminders widget. And I did, for a while. But here is what actually happens: someone walks up to my desk and says "hey, can you send that schematic by Thursday?" I nod. I think "I'll write that down in a second." Then another message pops up on Slack, and by the time I look up from the screen, the schematic thing is gone. Evaporated. I only remember it at 11 PM when I am brushing my teeth.

The friction is typing. Even pulling out my phone, unlocking it, opening the app, and thumb-typing a few words is enough of a barrier that I skip it more often than not. Over time the skipped notes pile up into a quiet avalanche of forgotten commitments. I needed something with zero friction. Something where I could just open my mouth, say what I need to remember, and be done.

So I decided to turn that big, beautiful e-paper screen into a voice-activated reminder board.

The plan

The reTerminal also takes a few seconds from button press to result. Why would this be any different?

Because the reTerminal sits right next to my keyboard. I do not have to reach into a pocket, unlock a screen, or switch out of whatever app I am working in. It is just there, like a notepad, always within arm's reach, always showing my current reminders. And here is the part that surprised me: this device, which is essentially a dedicated display, actually has a PDM microphone built into the board. A display with a microphone. That meant I could skip typing entirely and just talk to it.

This image is generated by GPT, for reference only.

That realization is what turned a vague idea into a concrete plan. Press the green button on top — KEY0 — and speak naturally: "Remind me to send the schematic to Zhang Wei by Thursday afternoon." Release the button. The device records your voice, sends the audio to a speech-to-text service, then passes the transcript to a large language model that extracts the task and the deadline. The result appears on the e-paper as a neatly formatted reminder card with the due date and time.

No typing. No app switching. No phone. Just press, speak, done. The whole interaction happens without leaving my chair or taking my eyes off my work.

Under the hood, the reTerminal E1003 packs a Seeed XIAO ESP32S3 Sense with 8 MB of OPI PSRAM, a PCF8563 real-time clock, and WiFi — everything I needed in one board. The e-paper is driven by an IT8951 controller that handles the 16-level grayscale rendering, and the panel is big enough to show eight reminder cards at once without squinting.

For the cloud side I went with Groq's free API tier. Whisper Large V3 Turbo handles the speech recognition, and LLaMA 3.3 70B Versatile does the memo extraction and scheduling. Both are free to use within generous rate limits, which means the whole project runs without spending a cent on API costs.

The moment I noticed the touch screen

About a week into development I had a basic prototype working. Press the button, speak, see the memo appear. It felt magical the first time. But then I ran into an obvious question: the screen can show eight memos. What happens when you have nine?

I was sketching out a scroll mechanism in my head — maybe a second button to page through — when I remembered something I had glossed over in the E1003 datasheet. The panel has a GT911 capacitive touch controller. A touch screen. On an e-paper display.

That changed everything. Instead of building pagination, I could let the user tap a checkbox next to any completed reminder. Done items turn gray and sink to the bottom of the list. When you record a ninth memo, it automatically replaces the oldest completed item. If nothing is checked off yet, the oldest entry gets bumped.

This is exactly how I think about to-do lists in real life. I do not organize them into categories or projects. I just dump everything in a stream, check off what I have finished, and let the list refresh itself. The touch screen made that possible without any extra buttons or menus.

How it actually works

When you press KEY0, the firmware immediately starts capturing audio through the PDM microphone at 16 kHz, 16-bit mono. The raw samples stream into a buffer allocated in PSRAM — at 20 seconds maximum, that is roughly 640 KB. A short beep from the onboard buzzer confirms that recording has started, and the LED lights up solid.

When you release the button, the device writes a WAV header onto the buffer, connects to WiFi if it is not already connected, and uploads the audio to Groq's Whisper endpoint. The transcript comes back in a few seconds. Then it goes to LLaMA 3.3 with a carefully constructed prompt that includes the current date, time, and day of the week. The model returns a structured JSON object containing the memo text, a due date and time, and a fuzzy label like "Tomorrow morning" or "Next Friday."

The firmware parses that JSON, saves the entry to the ESP32's non-volatile storage, sorts the list by due time, and redraws the e-paper. The whole cycle — from releasing the button to seeing the new card — takes about five to eight seconds depending on network conditions.

Every five minutes, even when idle, the display refreshes to update the header clock and re-sort the list. Overdue items get relabeled. And once a day, the device fetches a short motivational quote from the same LLM and displays it in the footer, just to keep the screen from feeling like a static piece of furniture.

The hard parts

Building this project taught me more than I expected. Here are the problems that kept me up at night and how I ended up solving them.

Recording audio on a device that hates multitasking

This image is generated by GPT, for reference only.

The first version of the recording code had a nasty bug. When you pressed the button, I would refresh the e-paper to show a "Recording..." status, then start capturing audio. It looked great in testing — until I played back the recordings and heard the first half-second was always silent or garbled.

The problem is that e-paper refreshes are slow and CPU-intensive. The IT8951 controller takes hundreds of milliseconds to push a full frame, and during that time the I2S DMA ring buffer — which only holds about 256 milliseconds of audio — quietly overflows. By the time the screen finishes updating and the code gets around to reading audio data, the first chunk is already gone.

The fix was counterintuitive: do not update the screen at all when recording starts. The moment KEY0 goes low, start capturing immediately. The LED and buzzer give the user feedback instead. Only after the button is released and the audio is safely in PSRAM do I refresh the display to show "Processing." It means there is no visual change on screen during recording, but you can hear the beep and see the LED, and the audio quality is perfect.

Teaching the AI to understand "tomorrow morning"

When someone says "remind me to water the plants tomorrow morning, " the language model needs to know what "tomorrow" means. If the device thinks it is January 1st but it is actually March 15th, every deadline will be wrong.

I solved this by injecting the device's current local date, time, and weekday name directly into the LLM system prompt. The prompt also instructs the model to return a strict JSON format with separate fields for the exact due time and a human-friendly label. I set the temperature to 0 so the output is deterministic — the same input always produces the same schedule.

There is a subtlety with fuzzy times. If someone says "remind me this evening, " there is no exact minute to anchor to. The model returns a fuzzy label like "Evening" and leaves the exact time as a reasonable default (say, 18:00). The display shows the fuzzy label prominently on the card so it feels natural rather than artificially precise.

The tricky part was handling edge cases. "Next Monday" when today is Monday should mean the Monday after this one, not today. "End of the month" should resolve to the actual last day of the current month. I found that LLaMA 3.3 handles most of these correctly as long as the prompt gives it enough context — the full date, the weekday name, and an explicit instruction about how to interpret "next [weekday]."

Making it work on smaller screens too

The reTerminal family includes two smaller devices: the E1001 with an 800 by 480 grayscale panel, and the E1002 with an 800 by 480 six-color panel. Neither has a touch screen.

I wanted the same firmware to support all three, but the smaller panels cannot fit eight cards. They top out at four. And without touch, there is no way for the user to check off completed items.

The solution was a replacement policy: when the four-card page is full, the next recording replaces whichever visible memo has the earliest due time. The idea is that the earliest one is most likely to be either done or irrelevant by now. It is not perfect, but it matches the "running log" workflow — you always see the most recent and most relevant reminders, and stale ones rotate out automatically.

The compact card layout uses smaller fonts and tighter spacing but keeps the same visual structure: checkbox icon on the left (purely decorative on non-touch devices), text in the middle, date chip on the right. It looks like a miniature version of the E1003 layout rather than a completely different UI.

Running it all for free

One of my design goals was that nobody should have to pay to use this project. Groq offers a free tier with generous rate limits for both Whisper and LLaMA. A typical day of use — maybe ten to fifteen recordings plus a daily quote fetch — barely scratches the surface of the free quota.

For developers who want to test without burning any API calls at all, the project includes a local Python gateway server. You run it on your laptop, point the firmware at its IP address, and it handles everything locally — either with a mock transcription for UI testing, or with a local Whisper model via faster-whisper and a local LLM through Ollama. The gateway is a single Python file with no required dependencies beyond the standard library in mock mode.

What it means to me

This image is generated by GPT, for reference only.

The device has been sitting on my desk for a few weeks now, and I have stopped forgetting things. That sounds like a small claim, but it has genuinely changed my daily routine. When someone tells me something I need to remember, I reach over, press the green button, and say it out loud. Three seconds later it is on the screen. I do not have to context-switch, pull out my phone, or even look away from my conversation.

The e-paper display is always on, always readable, and draws almost no power. It looks like a piece of stationery. Visitors to my desk sometimes do not even realize it is a computer until they see the time update.

If you have a reTerminal E-series board and you have ever wished you could just talk to your to-do list, I hope this project is useful to you. The firmware, the gateway, and all the source code are on GitHub. Flash it, fill in your API key, and start talking.

Credits

Mason

1 project • 0 followers

Mengdu

11 projects • 28 followers

I Built a Voice-Activated Memo Board on a 10.3-Inch E-Paper