This Picture Speaks a Thousand Words

TinyTTS makes photos talk using an ESP32-S3-powered display and a neural speech module, turning captions into spoken memories.

nickbild
7 minutes ago Photos & Video
The TinyTTS device (📷: Paul)

It has often been said that a picture is worth a thousand words. Now, a picture can actually speak those words. An embedded engineer named Paul has created a device called TinyTTS, a neural module that makes photos talk. You supply the system with pictures and captions, and as the photos are displayed, TinyTTS will audibly speak the captions. In this way, past experiences can be relived, and it can feel as if distant loved ones are nearby.

The system is built around the Elecrow CrowPanel Advance 5-inch, which integrates an 800x480-pixel resolution IPS capacitive touch display with an ESP32-S3 microcontroller. The CrowPanel reads and displays images, and looks up the associated captions before sending them to a tinyTTS kit via a UART connection. This module, powered by a Himax HX6538, generates human-like speech without a cloud connection.

The hardware (📷: Paul)

The synthetic speech is quite good, but not perfectly natural, which may detract from the experience somewhat. Since it is a generic voice that does not actually emulate the individuals in the photos, that could also make things a little weird. But in any case, TinyTTS is a very interesting concept.

If you would like to make your own, it is a fairly inexpensive build. The bill of materials and step-by-step build instructions are available in the project write-up.


nickbild

R&D, creativity, and building the next big thing you never knew you wanted are my specialties.

Latest Articles