Last week, I got a message from a guy worried about the voice notes his girlfriend sends from the balcony when she goes out for a smoke. He wanted help breaking into her WhatsApp to settle some suspicions. Now, I believe relationship issues should be solved by talking—not by hacking someone’s phone. That said, I was intrigued by the idea of building an autonomous little AI-powered gadget for such an absurd purpose.
So, I built the Guamp App (is not an app) —a proof-of-concept device that:
- Detects when the girlfriend is on the balcony
- Records her voice messages
- Transcribes the audio
- Uses AI to analyze whether the content sounds suspicious
- Sends the results to the boyfriend via Telegram
At first, I considered using a Raspberry Pi with Python. But then I received the ESP32S3 AI Cam module, and it got me thinking: could this tiny board handle the whole job?
What is the AI Camera modiule 1.0 DFR1154?It’s a 1.5"x1.5" ESP32-based board with:
A 2MP OV3660 Wide IR camera, an onboard I2S PDM microphone, microSD card slot, built-in LEDs and an amplifier for speaker output.(I don't have the micro speaker, but we don’t need it for this project.)
General WorkflowDetect the girlfriend on the balconyTrain a machine learning model with photos, deploy it to the camera, take a picture every few seconds, and run inference. If the result is over a confidence threshold, assume it's her.Record ambient audioStart recording audio when picture detection is triggered.
Transcribe the audioSend the recorded audio to OpenAI’s Speech-to-Text API (Whisper).
Analyze the transcriptSend the text (plus some context like names) to ChatGPT to see if there's any suspicious content.
Send results via TelegramUse a bot to notify the boyfriend remotely.
Computer VisionFor my demo, I used a general-purpose person detection model. To replicate it:
Download this ZIP file: PersonDetectionInferencing.zip
In Arduino IDE: Go to Sketch > Include Library > Add .ZIP Library
and add it.
To train your own model with specific person photos. Here's a sample project showing the process: Computer Vision for Alvik Robot
Deploy the trained model to Arduino and unzip to Documents/Arduino/libraries
Replace depthwise_conv.cpp
and conv.cpp
in src/edge-impulse-sdk/tensorflow/lite/micro/kernels
with patched versions from https://github.com/ronibandini/guampapp/tree/main/edgeimpulse
Download the camera example code: https://github.com/ronibandini/guampapp/tree/main/edgeimpulse/edgecamera
Move it to Documents/Arduino/libraries/modelFolder/examples
Open the example in Arduino IDE and edit the header to point to your model, e.g.#include <Girlfriend_inferencing.h>
If the demo runs fine, integrate the same include in your main sketch:guampAppUpload.ino
Install Universal Telegram Bot library.
You’ll need:
OpenAI API key (for transcription + analysis)Telegram bot token (for sending notificationsHere’s a project guide explaining how to create a Telegram bot and get the token.
Software Configuration#define REC_TIME 30 // recording time in seconds
#define BOT_TOKEN "" // Telegram bot token
String chatOperativo = ""; // chat ID for Telegram
String imgUrl = "http://someserver/guampapp.png";
String inseguro = "John"; // boyfriend
String pareja = "Ana"; // girlfriend
String terceroa = "Brad"; // suspected third party
const char* ssid = "";
const char* password = "";
const char* openai_server = "api.openai.com";
const char* openai_api_endpoint = "/v1/audio/transcriptions";
const char* openai_api_key = "";
const char* model = "whisper-1";
Use the serial monitor to debug the workflow: Wi-Fi connection, person detection, audio recording, transcription, and analysis.
I designed the case in Fusion 360 and printed it using a Bambu Lab A1 mini with PLA. It’s a 2-piece case secured with 3mm screws and includes a standard photo tripod mount. Download at Cults3d
If you want to improve or repurpose this project, you can make it smaller by adding a 3.7V battery and a TP4056 charger. You could also use an audio threshold to record full conversations instead of fixed times, and consider analyzing photos alongside the audio.
Beyond the starting point, which is as questionable as it is anecdotal, it’s still fascinating that an $18 module can locally run ML models to detect people, record and transcribe audio, send messages, and even analyze speech using AI.
Comments