The problem on my desk
What this project does
From Passive Dashboard to Voice Reminder Board
The moment I noticed the touch screen
How It Actually Works
Build and Flash the Firmware
The Hard Parts
Troubleshooting
What it means to me

•

Published June 11, 2026 © MIT

Turn a 10.3-Inch E-Paper Display into an AI Voice Memo Board

Turn the ESP32-S3-powered reTerminal E1003 into an AI voice memo board with speech-to-text, reminder extraction, and touch interaction.

AdvancedFull instructions provided6 hours88

Things used in this project

Hardware components

Seeed Studio reTerminal E1003 (ESP32-S3)

Software apps and online services

Qoder

I used Qoder with the PlatformIO extension to flash the firmware, but any PlatformIO-compatible IDE (e.g., VS Code) can be used to replicate this project.

PlatformIO extension

Speech-to-text / LLM API

Story

The problem on my desk

A few months ago, I was lucky enough to get my hands on a Seeed Studio reTerminal E1003—an ESP32S3-based dev board with a 10.3-inch e-paper display running at 1872 by 1404 pixels in 16-level grayscale. It is a gorgeous panel. The moment I peeled the protective film off and powered it on, the first thing that struck me was how much it looked like a sheet of paper sitting on my desk. Real paper. The kind you pin notes to.

That got me thinking about a problem that had been quietly ruining my productivity for months.

My day job involves juggling requests from every direction, from hardware teams, software teams, community partners, documentation deadlines, sample shipments, to demo prep. Things pile up fast. I used to tell myself I would jot everything down in the phone memo app or the macOS Reminders widget. And I did, for a while. But here is what actually happens: someone walks up to my desk and says, "Hey, can you send that schematic by Thursday?" I nod. I think, "I'll write that down in a second." Then another message pops up on Slack, and by the time I look up from the screen, the schematic thing is gone. Evaporated. I only remember it at 11 PM when I am brushing my teeth.

The friction is typing. Even pulling out my phone, unlocking it, opening the app, and thumb-typing a few words is enough of a barrier that I skip it more often than not. Over time the skipped notes pile up into a quiet avalanche of forgotten commitments. I needed something with zero friction. Something where I could just open my mouth, say what I need to remember, and be done.

So I decided to turn that big, beautiful e-paper screen into a voice-activated reminder board.

The reTerminal E1003 sitting on my desk as an always-on reminder board.

What this project does

By the end of this project, the reTerminal E1003 becomes an always-on voice memo reminder board.

You can press KEY0, speak a reminder naturally, and let the device turn your voice into a time-aware task card on the 10.3-inch e-paper display. The reminder is saved locally on the ESP32, sorted by due time, and can be marked complete directly from the touch screen.

The goal is not to build another general-purpose voice assistant. It is to build a quiet, focused desk companion for capturing small tasks before they disappear.

From Passive Dashboard to Voice Reminder Board

Most of my earlier e-paper experiments were passive dashboards: weather boards, calendar layouts, and static information panels. They were useful, but they mostly displayed information that already existed somewhere else. For this project, I wanted the e-paper display to do something more active. Instead of only showing data, it should help me capture a thought the moment it appears.

Of course, a voice assistant or phone reminder app can already do something similar. So why build a dedicated e-paper device?

Because the reTerminal sits right next to my keyboard. I do not have to reach into a pocket, unlock a screen, or switch out of whatever app I am working in. It is just there, like a notepad, always within arm's reach, always showing my current reminders. And here is the part that surprised me: this device, which is essentially a dedicated display, actually has a PDM microphone built into the board. A display with a microphone. That meant I could skip typing entirely and just talk to it.

Press button KEY0 to record your voice.

That realization is what turned a vague idea into a concrete plan. If the device already sits next to my keyboard, has an always-on e-paper display, and includes a microphone, then it does not need to be just a dashboard anymore. It can become a small capture tool for the desk: press a button, say the task, and let the reminder appear without opening another screen.

No typing. No app switching. No phone. Just press, speak, and be done. The whole interaction happens without leaving my chair or taking my eyes off my work.

Under the hood, the reTerminal E1003 packs a Seeed XIAO ESP32S3 Sense with 8 MB of OPI PSRAM, a PCF8563 real-time clock, and Wi-Fi—everything I needed in one board. The e-paper is driven by an IT8951 controller that handles the 16-level grayscale rendering, and the panel is big enough to show eight reminder cards at once without squinting. In practice, the PSRAM gives enough space for short audio buffers, the RTC keeps reminders time-aware, and the large grayscale panel gives the interface room to breathe.

For the cloud side, I used Groq's free API tier. Whisper Large V3 Turbo handles the speech recognition, and LLaMA 3.3 70B Versatile extracts the task, due date, due time, and display label. For a typical desk reminder workflow, around ten to fifteen recordings a day plus a daily quote refresh, the free quota is more than enough for testing and daily use.

For developers who want to avoid cloud calls during testing, the project also includes a local Python gateway server. You can run it on your laptop, point the firmware at its IP address, and use either mock transcription for UI testing or local models through faster-whisper and Ollama.

The moment I noticed the touch screen

About a week into development I had a basic prototype working. Press the button, speak, and see the memo appear. It felt magical the first time. But then I ran into an obvious question: the screen can show eight memos. What happens when you have nine?

I was sketching out a scroll mechanism in my head—maybe a second button to page through—when I remembered something I had glossed over in the E1003 datasheet. The panel has a GT911 capacitive touch controller. A touch screen. On an e-paper display.

That changed everything. Instead of building pagination, I could let the user tap a checkbox next to any completed reminder. Done items turn gray and sink to the bottom of the list. When you record a ninth memo, it automatically replaces the oldest completed item. If nothing is checked off yet, the oldest entry gets bumped.

This is exactly how I think about to-do lists in real life. I do not organize them into categories or projects. I just dump everything in a stream, check off what I have finished, and let the list refresh itself. The touch screen made that possible without any extra buttons or menus.

Tapping a reminder marks it as complete and moves it down the list.

How It Actually Works

The user interaction is simple, but several things happen behind the scenes after each recording.

The complete workflow of the voice memo reminder, from recording to task completion.

The recording step is intentionally simple from the user's side, but there is quite a bit happening underneath. As soon as KEY0 goes low, the firmware starts capturing audio through the PDM microphone at 16 kHz, 16-bit mono. The raw samples stream into a buffer allocated in PSRAM, at 20 seconds maximum, that is roughly 640 KB. A short beep from the onboard buzzer confirms that recording has started, and the LED lights up solid.

When you release the button, the device writes a WAV header onto the buffer, connects to Wi-Fi if it is not already connected, and uploads the audio to Groq's Whisper endpoint. The transcript comes back in a few seconds. Then it goes to LLaMA 3.3 with a carefully constructed prompt that includes the current date, time, and day of the week. The model returns a structured JSON object containing the memo text, a due date and time, and a fuzzy label like "tomorrow morning" or "next Friday."

Example of a successful voice reminder conversion from natural language to structured JSON.

For example, a sentence like “Remind me to get off work on Sunday, 15:30” can be converted into a small structured object:

{
  "task": "Get off work",
  "due_date": "2026-06-07",
  "due_time": "15:30",
  "label": "Sunday, 15:30"
}

The JSON object becomes the contract between the AI layer and the firmware. Once it comes back, the ESP32 only needs to parse the structured result, save it to NVS, sort the reminders by due time, and redraw the e-paper display. The whole cycle, from releasing the button to seeing the new card, takes about five to eight seconds depending on network conditions.

Every five minutes, even when idle, the display refreshes to update the header clock and re-sort the list. Overdue items get relabeled. And once a day, the device fetches a short motivational quote from the same LLM and displays it in the footer, just to keep the screen from feeling like a static piece of furniture.

Build and Flash the Firmware

The build process is fairly short because the reTerminal E1003 already includes the display, ESP32-S3, microphone, touch panel, RTC, and button in one device.

The firmware source code and detailed setup instructions are available in GitHub Repository: ePaper-Voice-Memo

In short, the setup process is:

Open the project in Qoder or VS Code with the PlatformIO extension.
Copy the example secrets file and fill in your Wi-Fi credentials and Groq API key.
Select the reTerminal E1003 build target and upload the firmware.
Open the serial monitor to check the Wi-Fi connection and API output.

Firmware successfully flashed to the reTerminal E1003 in Qoder.

After flashing, hold KEY0, speak a reminder, release the button, and wait for the first task card to appear on the e-paper display. Once this works, the rest of the interaction happens directly on the device: record with KEY0, view reminders on the screen, and tap the checkbox to mark a task complete.

The Hard Parts

Building this project taught me more than I expected. Here are the problems that kept me up at night and how I ended up solving them.

1. Recording audio on a device that hates multitasking

Concept illustration of the recording issue.

The first version of the recording code had a nasty bug. When you pressed the button, I would refresh the e-paper to show a "Recording..." status, then start capturing audio. It looked great in testing until I played back the recordings and heard the first half-second was always silent or garbled.

The problem is that e-paper refreshes are slow and CPU-intensive. The IT8951 controller takes hundreds of milliseconds to push a full frame, and during that time the I2S DMA ring buffer, which only holds about 256 milliseconds of audio, quietly overflows. By the time the screen finishes updating and the code gets around to reading audio data, the first chunk is already gone.

The fix was counterintuitive: do not update the screen at all when recording starts. The moment KEY0 goes low, start capturing immediately. The LED and buzzer give the user feedback instead. Only after the button is released and the audio is safely in PSRAM do I refresh the display to show "Processing." It means there is no visual change on screen during recording, but you can hear the beep and see the LED, and the audio quality is perfect.

2. Teaching the AI to understand "tomorrow morning"

When someone says "Remind me to water the plants tomorrow morning," the language model needs to know what "tomorrow" means. If the device thinks it is January 1st but it is actually March 15th, every deadline will be wrong.

I solved this by injecting the device's current local date, time, and weekday name directly into the LLM system prompt. The prompt also instructs the model to return a strict JSON format with separate fields for the exact due time and a human-friendly label. I set the temperature to 0 so the output is deterministic—the same input always produces the same schedule.

There is a subtlety with fuzzy times. If someone says "remind me this evening," there is no exact minute to anchor to. The model returns a fuzzy label like "Evening" and leaves the exact time as a reasonable default (say, 18:00). The display shows the fuzzy label prominently on the card so it feels natural rather than artificially precise.

The tricky part was handling edge cases. "Next Monday" when today is Monday should mean the Monday after this one, not today. "End of the month" should resolve to the actual last day of the current month. I found that LLaMA 3.3 handles most of these correctly as long as the prompt gives it enough context—the full date, the weekday name, and an explicit instruction about how to interpret "next [weekday]."

3. Making it work on smaller screens too

The reTerminal family includes two smaller devices: the E1001 with an 800 by 480 grayscale panel and the E1002 with an 800 by 480 six-color panel. Neither has a touch screen.

I wanted the same firmware to support all three, but the smaller panels cannot fit eight cards. They top out at four. And without touch, there is no way for the user to check off completed items.

The solution was a replacement policy: when the four-card page is full, the next recording replaces whichever visible memo has the earliest due time. The idea is that the earliest one is most likely to be either done or irrelevant by now. It is not perfect, but it matches the "running log" workflow, and you always see the most recent and most relevant reminders, and stale ones rotate out automatically.

The compact card layout uses smaller fonts and tighter spacing but keeps the same visual structure: a checkbox icon on the left (purely decorative on non-touch devices), text in the middle, and a date chip on the right. It looks like a miniature version of the E1003 layout rather than a completely different UI.

The firmware also supports both English and Chinese voice memos.

Troubleshooting

1. No serial port appears

Use a USB-C data cable, not a charge-only cable. Check the device battery level and confirm that the port appears in your system device manager.

2. Upload failed

Make sure the correct serial port is selected in PlatformIO. If the upload still fails, reconnect the device and try uploading again.

3. Recording starts, but no text appears

Check the Wi-Fi credentials, API key, and whether the network can reach the speech-to-text and LLM API endpoints.

4. The screen does not update immediately

The display refreshes periodically, and some status updates are intentionally handled with the LED and buzzer instead of a full e-paper refresh.

5. The time or quote does not update

Wait for the scheduled refresh cycle and check the network connection if the issue continues.

My voice memo board in my office under different lighting conditions.

What it means to me

The device has been sitting on my desk for a few weeks now, and I have stopped forgetting things. That sounds like a small claim, but it has genuinely changed my daily routine. When someone tells me something I need to remember, I reach over, press the green button, and say it out loud. A few seconds later it is on the screen. I do not have to context-switch, pull out my phone, or even look away from my conversation.

The e-paper display is always on, always readable, and draws almost no power. It looks like a piece of stationery. Visitors to my desk sometimes do not even realize it is a computer until they see the time update.

This is not a full voice assistant, and it still depends on Wi-Fi and cloud APIs unless you use the local gateway. But for one narrow problem, capturing small tasks before they disappear, the tradeoff feels right.

If you have a reTerminal E1003 board and have ever wished you could just talk to your to-do list, the firmware, gateway, and source code are all on GitHub. Flash it, add your API key, press the green button, and try talking to your desk.

Credits

Mason

1 project • 0 followers

Mengdu

11 projects • 28 followers

Turn a 10.3-Inch E-Paper Display into an AI Voice Memo Board