Liz Clark's MEMENTO Smart Camera Sends Images to OpenAI for Textual Descriptions and More

Delivering a captured image plus a prompt to GPT-4 Vision, this compact smart camera can describe scenes, translate text, and more.

If you've ever found yourself lacking the words to describe what you're saying, Liz Clark's latest project for Adafruit may come in handy: turning the smart MEMENTO camera into a visual description engine with a link to OpenAI's GPT-4 Vision interface.

"The inspiration for this project came from the Descriptive Camera project by Matt Richardson," Clark explains of the MEMENTO-powered build. "It was built in 2012 with a BeagleBone that connected to the Amazon Mechanical Turk API [Application Programming Interface]. That API would outreach to people on the internet to complete tasks. In this case, it was asking those workers to create metadata manually for each photo submitted. Just over ten years later, and this process can now be automated and run without the need for a Linux OS."

This MEMENTO-powered project delivers textual descriptions of whatever the camera sees, courtesy of OpenAI's GPT-4. (πŸ“Ή: Liz Clark/Adafruit)

The secret sauce: OpenAI's multimodal GPT-4 large language model (LLM), which offers a Vision system for the analysis of images. Like all LLMs, it will respond to prompts with a textual missive calculated to mirror a real-world response β€” sometimes useful, sometimes not, and sometimes a hallucination. In this case, the prompt tells GPT-4 to describe a provided image transmitted directly from the MEMENTO camera system.

Built around an Espressif ESP32-S3, the Adafruit MEMENTO is a standalone smart camera system with Wi-Fi and Bluetooth Low Energy (BLE) connectivity. In Clark's project, it's used to capture a photo and then automatically transfer it to OpenAI for analysis in GPT-4. By default, the prompt asks for a straightforward description of the image β€” but alternate modes provide a haiku as a response, translate pictured text to English, and even attempt to identify pictured cables.

"Artificial intelligence is one of the most divisive issues of our time. The approach and mindset of this project is that at its core, AI is a tool that is only as good as the human giving input to it," Clark writes of the LLM behind the descriptive camera system. "You'll see that the prompts used in this project are worded in a way to try and achieve a result that is both useful and concise; with the keyword being try. Mistakes on the part of the API are entirely possible and responses should always be checked."

"However," Clark continues, "in general, the responses have been found to be accurate and, as a result, exciting, especially when considering how some of these more utilitarian prompts could be engineered in the future for accessibility applications."

With those caveats in mind, the project does offer a glimpse of how generative artificial intelligence (gen-AI) technology could aid accessibility β€” and projects like this are ideal for our Build2gether 2.0 Inclusive Innovation Challenge, which is open for entries now with 200 hardware "SUPERBOX" prizes up for grabs.

The full project write-up is available on the Adafruit Learn portal now; the MEMENTO is priced at $34.95, though at the time of writing was listed as out of stock on the Adafruit web shop.

Gareth Halfacree
Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire:
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles