In this project, I built a real-time AI face analytics camera using **ReCamera**.
Resources
- GitHub Repository: [RobotXTeam/sscma-example-sg200x]
- Pre-trained Models: [sscma-example-sg200x v1.0.1 Release]
- Detailed Deployment Wiki: [ReCamera UDP Face Analysis Wiki]
For the complete step-by-step deployment process, including environment setup, model preparation, compilation, running the ReCamera executable, and launching the PC receiver, please refer to the detailed wiki tutorial above.
This demo turns ReCamera into a compact edge AI camera that can detect human faces, estimate age/gender/race attributes, recognize emotions, and stream annotated video results to a PC in real time over UDP.
Unlike a simple object detection demo, this project combines multiple AI tasks into one complete edge vision pipeline:
- Face detection with a YOLO face model
- Age, gender, and race estimation with a FairFace-based model
- Emotion recognition with a 7-class emotion model
- JPEG frame compression
- UDP video and metadata streaming
- Real-time visualization on a PC using Python and OpenCV
The final result is a compact edge AI camera that can analyze faces locally and display the results live on a computer.
Many AI camera applications rely on cloud processing. While cloud AI is powerful, it also introduces several problems:
- Higher latency
- More network bandwidth usage
- Privacy concerns
- Dependency on internet connectivity
For many real-world applications, it is better to run AI inference directly on the device.
That is why I wanted to build a real-time face analytics system on ReCamera. The goal was to demonstrate how an edge AI camera can handle video capture, AI inference, and real-time streaming without sending raw video to a cloud server.
This type of system can be used as a starting point for applications such as:
- Smart retail analytics
- Interactive kiosks
- Classroom attendance systems
- Human-computer interaction
- Edge AI camera prototyping
- Embedded computer vision education
This project is mainly intended for development, research, and demonstration purposes. When working with face analytics, always make sure that people are informed and have given consent.
How the System WorksThe system is divided into two parts:
- ReCamera sideThe ReCamera runs a C++ application. It captures video frames, runs AI inference, compresses the image, and sends the results through UDP.
- PC sideThe PC runs a Python receiver script. It receives JPEG frames and detection metadata, decodes them, draws bounding boxes and labels, and displays the final video stream in real time.
The overall architecture is shown below:
+--------------------------------------------------+
| ReCamera |
|--------------------------------------------------|
| Camera Capture |
| | |
| v |
| YOLO Face Detection |
| | |
| v |
| Age / Gender / Race Analysis |
| | |
| v |
| Emotion Recognition |
| | |
| v |
| JPEG Compression + Metadata Packaging |
| | |
+--------|-----------------------------------------+
|
| UDP Stream
v
+--------------------------------------------------+
| PC |
|--------------------------------------------------|
| Python UDP Receiver |
| OpenCV JPEG Decoding |
| Draw Face Boxes and Attribute Labels |
| Real-Time Display |
+--------------------------------------------------+AI PipelineThe AI pipeline contains three main stages.
1. Face DetectionThe first model is a YOLO-based face detection model. It detects faces in the camera frame and outputs bounding boxes with confidence scores.
A confidence threshold can be configured when launching the program. This allows the user to balance sensitivity and false positives.
2. Attribute AnalysisAfter a face is detected, the cropped face region is passed to an age/gender/race attribute model.
The model estimates:
- Gender
- Age range
- Race category
These results are displayed as text labels near the detected face.
3. Emotion RecognitionThe detected face is also passed to an emotion recognition model.
The emotion model supports seven classes:
- Angry
- Disgust
- Fear
- Happy
- Sad
- Surprise
- Neutral
This makes the demo more interactive, because the displayed label can change in real time as the user changes facial expressions.
Real-Time UDP StreamingAfter inference, the ReCamera compresses the video frame as JPEG and sends it to the PC through UDP.
I chose UDP because it is lightweight and suitable for real-time video streaming. In this demo, occasional packet loss is acceptable because the system continuously sends new frames.
The PC receiver script does the following:
- Receives UDP packets from ReCamera
- Reconstructs the JPEG frame and metadata
- Decodes the image with OpenCV
- Draws face bounding boxes and labels
- Displays the final result in a real-time window
This makes the system easy to test. The ReCamera handles AI inference, while the PC provides a convenient visualization interface.
Performance OptimizationRunning multiple AI models on an embedded device is challenging. To keep the system responsive, I used several optimization strategies.
Skip-Frame InferenceInstead of running AI inference on every video frame, the program can run inference every N frames.
For example, with a skip value of 3, the system only performs inference once every three frames. This reduces the AI workload while keeping the visual output smooth enough for real-time viewing.
JPEG CompressionRaw video frames are too large for efficient UDP streaming. Therefore, the frame is compressed into JPEG before transmission.
This greatly reduces bandwidth usage and makes real-time streaming more practical.
Configurable Threshold and ModeThe program supports different YOLO head types and detection thresholds. This makes it easier to adjust the system for different models or environments.
For example:
./face_udp yolo-face_mixfp16.cvimodel age_gender_race_bf16.cvimodel emotion_bf16.cvimodel multi 0.5 3 192.168.31.100 5001
./face_udp yolo-face_mixfp16.cvimodel age_gender_race_bf16.cvimodel emotion_bf16.cvimodel multi 0.5 3 192.168.31.100 5001In this command:
multiselects the YOLO head type0.5sets the detection threshold3means inference runs every 3 frames192.168.31.100is the PC IP address5001is the UDP port
When the system is running, the PC displays a real-time video window.
Each detected face is marked with a bounding box. The result label shows the detected attributes and emotion, such as:
On the ReCamera terminal, the program also prints performance statistics, including video FPS, UDP FPS, inference time, and throughput.
These statistics are helpful for understanding the performance of the full edge AI pipeline.
What I LearnedThis project helped me understand how to build a complete edge AI vision system, not just run a single model.
The most important parts were:
- Integrating multiple AI models into one pipeline
- Balancing inference speed and video smoothness
- Sending image data and metadata over UDP
- Visualizing embedded AI results on a PC
- Optimizing the workflow for real-time performance
The project also shows that ReCamera can be used as a flexible AI camera platform for real-time computer vision application
Future ImprovementsThere are several ways this project can be improved in the future:
- Add face tracking to reduce repeated detection cost
- Add a web dashboard for browser-based visualization
- Support RTSP or WebRTC streaming
- Add local recording of detection results
- Improve packet handling for unstable networks
- Add more models, such as mask detection or gaze estimation
- Deploy the visualization directly on an embedded display






Comments