How to Avoid a Cat-astrophe
This AI cat detection system uses a Raspberry Pi and the GPT-4 Vision API to narrate every move your cat makes like a nature documentary.
New cat owners should expect to be stripped of any illusion that their new friend can be trained within the first five minutes of life together. Unlike our canine companions, cats are well-known for their individuality and strong personalities. But the first time they purr and brush up against their owner’s leg, the battle has already been won. The cat can stay and make its own rules in exchange for absolutely nothing, and yet it somehow seems like a good bargain.
Needless to say, we cannot expect consistently good behavior from our furry friends. The best a cat owner can hope for is to manage the problems. That is what engineer and cat lover Yoko Li of San Francisco has learned to do — but with a high-tech twist. Li built an AI-powered cat detection system to keep an eye on her kitties and make sure they do not jump up on the table or kitchen counter while she is away. Or rather, it will send her an email to let her know that they are being bad without actually stopping them. Come to think of it, this might actually be a win for the cats. This system lets them taunt their owners as they flaunt the rules while we are away. Ugh! Outsmarted once again!
But cat detection is not the most interesting part of this build by any stretch of the imagination. It also doubles as a narrator that describes everything your cat is doing as if you were watching a documentary voiced by David Attenborough. In one demonstration, a cat walking on the kitchen counter near the sink was described as “a glorious tabby cat closely investigating the human’s watering hole, brimming with feline curiosity as it paws cautiously at the mysteries contained within the shiny metal basin.” If that’s not masterful, I don’t know what is.
The entire system runs on a Raspberry Pi single-board computer, with a camera attached to capture images. Those images are then sent to OpenAI’s GPT-4 Vision API along with a text prompt specifying the style in which to describe the content of those images. The textual response is then converted into an audio file using an ElevenLabs text-to-speech model. That audio file, in turn, is played through a speaker. And if the narrator sees anything naughty, Li will receive an email about the incident.
If you would like to use this tool beyond narrating every action your cat takes in the course of the day, Li suggests that it can also be used for bird watching (to summarize all of the birds that were seen in a day), as a raccoon deterrent, or, if you have had enough of nature (is that possible?), to let you know when a package has been delivered.
Source code is available on GitHub and has been released under a very permissive MIT license. There is also some nice documentation available to help you get up and running quickly. The documentation makes it clear, however, that no cats are included in the repository. You will be expected to supply your own cats for your projects.