We've seen a dramatic increase in the capabilities of artificial intelligence (AI) over the last few years and the tech community is still trying to figure out the potential applications and ramifications. Models like ChatGPT and DALL-E have obvious use cases in the world of content creation—something professional content creators like myself find troubling — but makers find more innovative uses for the technology every day. Mina Fahmi took advantage of several different AI services to create Project Ring, which is a hand-worn device that perceives the world and communicates what it sees to the user.
Project Ring is a small device that straps onto the top of the user's hand and includes a ring extension worn on the user's index finger. The main unit on the hand houses most of the hardware necessary for processing, while the ring unit contains a joystick for user interaction and a camera to look at its surroundings. When the user points the camera at something, Project Ring will analyze what it sees and provide a spoken description to the user through their headphones. It also listens for user commands to aid in interaction.
All of this works using existing AI services that anyone can utilize. The kicker is that Fahmi also programmed the entire system using an AI service (GPT-4). So, in a manner of speaking, an AI created this device that uses AI.
Of course, Fahmi still had to conceptualize the system, guide the AI programming, devise a hardware strategy, design the 3D-printed parts, and assemble everything. The primary piece of hardware is a Raspberry Pi Zero W single-board computer, which accepts input from the joystick and camera. It communicates with Google Cloud Run to access the various AI services needed for this all to come together: image-to-text, voice-to-text, text-to-text, and text-to-voice. The Raspberry Pi doesn't do any of that processing itself and instead offloads everything to these cloud services, meaning that it requires an internet connection.
Project Ring speaks to the user through their Android phone and a headset. If, for example, the user asks Project Ring (with a voice command) to describe what it sees, then it will: capture an image with the camera, run that through the image-to-text service, then run it through the text-to-speech service to generate the audio fed to the user's headset.
And while Project Ring is mostly just an experiment in what one can achieve with AI, it could have real-world benefits for people that have poor eyesight.