Simon Aubury's GenPiCam Runs Captured Images Through Two Generative AIs, with Oft-Surprising Results

Camera-at-two-removes uses one generator to turn real-world images into a textual description and another to turn that back into an image.

Systems architect Simon Aubury has developed a Raspberry Pi-powered camera with a different: it doesn't capture what you point it at, but a recreation filtered through two generative artificial intelligence systems.

"Generative AI (GenAI) is a type of Artificial Intelligence that can create a wide variety of images, video and text," Aubury explains by way of background. "To accelerate the robot uprising I chained two GenAI models together to build a camera which describes the current scene in words, and then uses a second model to create a new generated stylized image. Let me introduce GenPiCam — a RaspberryPi based camera that re-imagines the world with GenAI."

This camera doesn't work in quite the way you'd expect, thanks to not one but two generative AI engines. (📹: Simon Aubury)

In terms of hardware, the project is pretty simple: a Raspberry Pi 4 Model B single-board computer is placed in a custom housing with a display at the rear and a push-button switch wired into its general-purpose input/output (GPIO) port. A Raspberry Pi Camera Module, located at the front, captures a still every time the button is pushed — but rather than simply saving the resulting image, begins a process of AI-based transformation.

In the first step, the captured photo is sent through Midjourney's Describe AI system — which inspects an image and then generates a textual description which may or may not accurately match the contents of the picture. This description is then fed back through a second Midjourney AI, Imagine, which turns the description back into an image — but one which has never existed in the real world.

The camera includes a "filter" control which adds style instructions to the text prompt. (📷: Simon Aubury)

To provide some degree of artistic control, Aubury added a rotary switch which allows for a range of styles to be included in the image-generation prompt — from "pop art" to "anime." A Python program running on the Raspberry Pi then takes the generated image and collages it with the original capture and the text prompt it inspired.

"I had so much fun building the GenPiCam camera — and this was an interesting path for exploring prompt engineering for Generative AI. The better photos were the ones which had a simple composition — essentially images that were easy to put words to. The GenPiCam has been a fun way to explore Generative AI, transforming photos into stylized (and sometime surprising) images."

Aubury's full project write-up is available on Medium.

ghalfacree

Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.

Latest Articles