Smart Coop 2.0 is the upgraded version of our original automated chicken coop system. The first version successfully automated basic coop tasks such as door control and feeding. However, we wanted to push the idea further by improving reliability, adding better monitoring, and simplifying the electronics so that others could replicate the project easily.
With this new version, we focused on making the system more modular, easier to assemble, and capable of providing better insight into what is happening inside and around the coop. By combining automation, computer vision, and IoT connectivity, Smart Coop 2.0 aims to make backyard poultry management smarter, safer, and more convenient.
What’s New in Smart Coop 2.0?- Redesigned door opening mechanism using a DC motor and limit switches for improved reliability and lower power consumption
- Improved feeding system using a stepper motor for more precise feed dispensing
- Dual webcam monitoring system for predator detection and egg monitoring
- Microphone-based sound detection to identify predator sounds around the coop
- Temperature control system with a heating bulb to maintain a comfortable temperature inside the coop.
- Integration with Arduino IoT Cloud for easier monitoring and control
- Custom-designed PCB for cleaner wiring and stable electronics integration
- Modular 3D-printed mechanical structure with acrylic panels for easy replication and customization.
Smart Coop 2.0 features a modular mechanical structure designed in SolidWorks and optimized for 3D printing. The entire coop is divided into four sections so it fits within a standard printer build volume. These sections are assembled using steel rods, which provide strength and maintain proper alignment of the structure.
Most structural parts are 3D printed, making the design easy to reproduce and customize. The front and rear panels are made from acrylic sheets—the front panel is colored for a cleaner exterior look, while the rear panel is transparent to allow visibility inside the coop.
Threaded inserts are used across the design, with nearly 90% of the components mounted using them. This improves mechanical strength and allows parts to be assembled, removed, or replaced without damaging the printed components.
Main Controller – Arduino UNO QThe Arduino UNO Q serves as the main controller of Smart Coop 2.0, coordinating all the core functions of the system. It manages the sensors, controls the motors for the door and feeding mechanisms, and handles communication between the different modules inside the coop.
The controller also interfaces with the webcams and microphone through a USB hub, enabling predator detection, egg monitoring, and sound-based detection. In addition, it controls the heating system and monitors environmental conditions to maintain a comfortable environment for the chickens.
Using the Arduino UNO Q as the central controller keeps the system simple, reliable, and easy for others to replicate, while still supporting advanced automation and monitoring features.
When using the Arduino UNO Q for the first time, the board is connected to a computer using a USB-C cable. The Arduino IDE is then installed to allow code to be written and uploaded to the board. After installing the necessary board support packages and drivers, the correct board and serial port are selected in the IDE.
Once the setup is complete, a simple test program can be uploaded to confirm that the board is working correctly.
Door Mechanism DesignThe door mechanism in Smart Coop 2.0 largely follows the same core design as the previous version, with a few refinements for improved reliability and control. The system uses a low-power DC gear motor (100 RPM, 12V recommended) capable of delivering up to 6.5 kg·cm torque (235 N·cm stall), which is sufficient for opening the door reliably. The motor draws up to 900 mA under load, keeping overall power consumption low while still providing adequate force.
To drive the motor, a BTS7960 motor driver is used instead of the L298, offering significantly better efficiency and current handling. The BTS7960 supports high current loads (up to 43A peak), operates within a voltage range of approximately 5.5V to 27V, and includes built-in protection features such as overcurrent, overtemperature, and undervoltage protection, making the system more robust during operation.
Additionally, two limit switches are used to define the fully open and fully closed positions of the door. These switches provide precise feedback to the controller, ensuring accurate movement and preventing over-travel.
An additional improvement is the belt tensioning mechanism, which allows the user to adjust the belt tension if needed. This helps maintain smooth door operation over time and makes maintenance easier.
Smart Coop 2.0 uses webcams and a microphone to monitor the coop and detect potential threats. Instead of the previous HuskyLens sensor, the new design uses Logitech 1080p webcams, which provide better image quality and a wider field of view. One camera is mounted on the front of the coop to monitor the surroundings and detect potential predators, while another camera is placed inside the coop to observe the nesting area and detect eggs.
Along with the cameras, a microphone is also included for sound-based predator detection. The microphone allows the system to identify suspicious sounds such as barking, growling, or other animal noises that may indicate the presence of predators near the coop.
Since the Arduino UNO Q has only a single USB-C port, both webcams and the microphone are connected through a USB hub, enabling multiple peripherals to interface with the controller simultaneously. This setup allows the system to combine visual and audio monitoring for improved detection and overall coop safety.
For the vision tasks, we trained YOLOv8 models to perform predator detection and egg detection. This involved collecting and labeling image datasets of common predators such as snakes, foxes, and stray dogs, as well as images of eggs inside the nesting area. The datasets were labeled and prepared using annotation tools, then trained using the Ultralytics YOLOv8 framework. After training, the models were exported in ONNX format so they can run efficiently on the system.
For sound-based detection, the system uses YAMNet, a pre-trained audio classification model from TensorFlow. YAMNet is capable of recognizing a wide range of environmental sounds, including animal calls. By processing the audio captured by the microphone, the model can identify predator-related sounds and trigger alerts alongside the vision-based detection.
By combining computer vision and audio-based machine learning, Smart Coop 2.0 can monitor both visual and acoustic signals to provide a more reliable and intelligent coop monitoring system.
Vision ML Models For Predator And Egg DetectionThe Smart Coop runs two AI models at the same time. One watches the nesting boxes and counts eggs. The other monitors the coop perimeter and raises an alert the moment a predator shows up. Both models are built on the same neural network architecture, trained on completely different datasets for completely different tasks, and deployed together on the Arduino Uno Q.
Vision Model ArchitectureBoth models use YOLOv8-Nano from Ultralytics - a single-stage, anchor-free object detector that processes an image once and outputs every detection in a single forward pass.
1.The Backbone reads the image and builds feature representations at three scales: a 40×40 grid retaining fine spatial detail for small objects like eggs, a 20×20 grid for medium-sized objects, and a 10×10 grid capturing the broad scene context needed for large animals.
2.The Neck fuses these three scales together with a top-down then bottom-up pass, so every scale has the benefit of context from the others.
3. The Detection Head attaches to each fused scale and predicts bounding box coordinates and class scores for every spatial position. For 320×320 input, the three heads together evaluate 8400 candidate positions.
The raw output tensor is [1, 4+nc, 8400] where nc is the number of classes - [1, 5, 8400] for the egg detection model and [1, 14, 8400] for the predator detection model. A Non-Maximum Suppression (NMS) step then filters overlapping predictions, keeping only the highest-confidence box per object.
Both models start from COCO pre-trained weights. The backbone already knows how to extract meaningful visual features from 118, 000 images across 80 categories — we are redirecting that existing knowledge, not building from scratch. This is why both models reach good performance in a relatively short training run. The network has 3.2 million parameters, processes a 320×320 frame in ~4.1 GFLOPs, and exports to a ~12 MB ONNX file (or ~3.5 MB with INT8 quantization).
Egg Detection ModelThe egg model scans the camera frame, draws a bounding box around every egg it finds, and reports the count. That count flows to the STM32 side for logging, display, or triggering a collection reminder. It detects a single class - egg - so every detection is binary: egg or nothing.
The Dataset We sourced the dataset from Roboflow Universe. The dataset covers a good range of real-world conditions. You will find white eggs and brown eggs, eggs sitting in nesting boxes, eggs loose on the coop floor, single eggs and full clutches, eggs photographed close up and eggs small in the frame with chickens in the background. That variety is deliberate — the more diverse the training images, the better the model handles whatever your specific coop setup looks like on any given day.
After downloading, the dataset is structured into three splits. The training set is what the model actually learns from. The validation set is used during training to check how well the model is generalising to images it has never seen — this is what drives early stopping. The test set is held back entirely and only used at the very end to evaluate the final model on completely unseen data.
Labels are stored in YOLO format. Each image has a corresponding .txt file where every line describes one egg:
0 <cx_normalised> <cy_normalised> <width_normalised> <height_normalised>The class index is always 0 (egg) and all coordinates are normalised between 0 and 1 relative to the image dimensions.
Augmentation Because the dataset is moderate in size rather than enormous, the augmentation pipeline has to work hard during training to expose the model to enough variation. These transformations are applied on-the-fly - they do not change the actual image files, they just generate modified versions during each training epoch.
hsv_h=0.015, # hue shift ±1.5% — handles variation across egg colours (white/brown/green)
hsv_s=0.7, # saturation ±70% — handles different lighting conditions throughout the day
hsv_v=0.4, # brightness ±40% — heat lamp at night vs. daylight vs. overcast
degrees=15, # rotation ±15° — eggs roll and sit at angles in nesting material
translate=0.1, # position shift ±10%
scale=0.5, # scale ±50% — eggs at different distances from the camera
fliplr=0.5, # 50% horizontal flip — eggs look the same mirrored
flipud=0.0, # no vertical flip — eggs don't appear upside down in a nesting box
mosaic=1.0, # mosaic always on — tiles 4 images per sample (most impactful augmentation)
mixup=0.0, # disabled — single class, no inter-class boundaries to learnThe HSV colour jitter is one of the most important augmentations for this specific use case. It randomly shifts the hue by up to 1.5%, the saturation by up to 70%, and the brightness by up to 40% on every training image. The reason this matters so much in a coop environment is that lighting conditions change dramatically throughout the day and across seasons. The same nesting box looks completely different under an incandescent heat lamp at night, early morning natural light, and direct afternoon sun through a coop window. On top of that, eggs themselves vary in colour from stark white to deep brown to pale green depending on the breed. Without aggressive colour jitter, the model would learn to recognise eggs only under the conditions present in the training images and would struggle the moment the lighting shifts.
Rotation up to ±15 degrees handles the reality that eggs are not always sitting perfectly upright in the frame. They roll, they sit at angles in the nesting material, and depending on your camera angle they may appear tilted.
Scale variation of ±50%*simulates eggs at different distances from the camera - useful if the camera is mounted at a height where the apparent size of an egg can change based on exactly where in the frame it sits. **Horizontal flip at 50% probability** is applied because eggs look the same from both sides and doubling the effective dataset size with mirror images costs nothing. Vertical flip is deliberately turned off. Eggs in a nesting box simply do not appear upside down, and including vertically flipped images would train the model on physically impossible scenarios.
Mosaic augmentation deserves specific attention because it is the most impactful augmentation in this pipeline. With mosaic enabled at full probability, every training sample is actually a composite of four different images stitched into a single frame, with each image occupying one quadrant. The model has to detect eggs in each quadrant simultaneously, which teaches it several things at once: detecting objects near image boundaries, detecting objects that are partially cut off, and handling multiple objects in a single frame. For a dataset of this size, mosaic effectively quadruples the number of unique training scenarios the model encounters per epoch.
Mosaic is disabled for the last 10 epochs of training via the `close_mosaic=10` setting. This is a deliberate fine-tuning phase. Once the model has learned the core task through augmented data, it benefits from spending its final epochs training on clean, natural images. This typically gives a small but consistent improvement in the final validation metrics.
Training Configuration
from ultralytics import YOLO
model = YOLO('yolov8n.pt') # start from COCO pre-trained weights
model.train(
data = 'egg_detection_model/dataset/data.yaml',
epochs = 30, # full run — early stopping has patience=20
imgsz = 320,
batch = 16,
device = 'cuda', # or 'mps' / 'cpu'
optimizer= 'AdamW', # faster convergence than SGD on short runs
lr0 = 0.001, # initial learning rate
lrf = 0.01, # final lr = lr0 * lrf → decays to 0.00001
momentum = 0.937,
weight_decay = 0.0005,
warmup_epochs = 3,
warmup_momentum= 0.8,
warmup_bias_lr = 0.1,
patience = 20, # stop if val mAP50 doesn't improve for 20 epochs
cache = True, # keep decoded images in RAM speeds up next epochs
project = 'egg_detection_model',
name = 'yolov8n_eggs',
hsv_h=0.015, hsv_s=0.7, hsv_v=0.4,
degrees=15, translate=0.1, scale=0.5,
fliplr=0.5, flipud=0.0, mosaic=1.0, mixup=0.0,
save=True, save_period=10, plots=True, verbose=True,
)We use AdamW over SGD because with a 30-epoch budget it converges faster and more predictably - AdamW gives each parameter its own adaptive learning rate based on gradient history, which stabilises training across layers learning at different speeds. Patience of 20 is wide enough to ride through temporary plateaus without cutting the run short. cache=True keeps decoded images in RAM after the first epoch, removing disk I/O from subsequent epochs.
The loss is a weighted sum of CIoU box regression (7.5), Binary Cross-Entropy classification (0.5 - low because a single class is trivial to classify), and Distribution Focal Loss (1.5), which models boundary uncertainty as a distribution rather than a point, helping with the ambiguous egg-to-nesting-material edge.
Training Results Training ran for the full 30 epochs. Best checkpoint at epoch 22:
| Metric | Value |
|-----------|-------|
| mAP@50 | 0.965 |
| mAP@50-95 | 0.699 |
| Precision | 0.887 |
| Recall | 0.956 |Training and validation losses tracked each other closely throughout (train 1.014, val 1.009 at epoch 30), confirming the model generalised well without overfitting. By epoch 10 mAP@50 was already at 0.957, showing the strong head start from COCO pre-training.
Predator Detection ModelThe predator model identifies threats around the coop and reports which species it found, not just that something is there. A fox outside warrants a logged alert; a snake inside the coop warrants an immediate alarm. The model detects ten species:
CLASS_NAMES = [
'Bobcat', # 0
'Cougar', # 1
'Coyote', # 2
'Fox', # 3
'Opossums', # 4
'Raccoon', # 5
'Rodent', # 6
'Skunk', # 7
'Snake', # 8
'Weasel', # 9
]This is a much harder problem than egg detection. Ten visually similar classes, uncontrolled outdoor environments, variable lighting, complex backgrounds, partial occlusion, and animals that appear in any pose. A coyote and a fox share body proportions. A weasel and a large rodent are nearly identical in silhouette at 320×320. A snake breaks the standard four-legged silhouette entirely.
Dataset The predator dataset required a preparation step you will need to repeat if you ever rebuild it. We started from a large dataset called Poultry-and-Predators-Detection with 19 classes - a mix of farm animals (chicken, duck, turkey, emu, ostrich, guinea fowl, quail, etc.) and the predators we need. Training on all 19 classes wastes model capacity on irrelevant poultry recognition and risks confusion in frames where a chicken and a fox appear together.
Augmentation The augmentation strategy for the predator model is similar to the egg model in most ways, but with two meaningful differences.
Mixup augmentation is enabled at a probability of 0.1, which was disabled entirely for the egg model. Mixup takes two training images and blends them together at a random mixing ratio, combining their labels proportionally. The result is a slightly transparent double-exposure — you see parts of both images overlaid. For a single-class model like the egg detector, mixup adds little value. For a 10-class model where several classes look visually similar, mixup provides a regularisation effect that forces the model to produce calibrated, uncertain predictions rather than being overconfidently wrong. When a Coyote image is mixed with a Fox image, the model learns that the two are related but distinct, which leads to better-calibrated confidence scores at inference time.
Rotation is reduced to ±10 degrees compared to ±15 degrees in the egg model. Eggs can sit at almost any angle in a nesting box, but predators are gravity-bound - they walk upright (or slither horizontally) and rarely appear at steep angles relative to the camera's viewing direction. Allowing too much rotation during training could teach the model to expect animals in unrealistic orientations, which might actually hurt performance on real frames.
Training Configuration The predator model was trained for 100 epochs, more than three times longer than the egg model. This is not arbitrary. Distinguishing 10 visually similar animal species requires the model to learn much subtler discriminative features than recognising a single oval object class. The backbone needs to develop a richer internal representation to tell a Coyote from a Fox, and that takes more iterations.
from ultralytics import YOLO
model = YOLO('yolov8n.pt')
model.train(
data = 'predator_detection_model/dataset/data.yaml',
epochs = 100, # 3× longer than egg model — 10 classes need more iterations
imgsz = 320,
batch = 16,
device = 'cuda',
optimizer= 'AdamW',
lr0 = 0.001,
lrf = 0.01,
momentum = 0.937,
weight_decay = 0.0005,
warmup_epochs = 3,
warmup_momentum= 0.8,
warmup_bias_lr = 0.1,
patience = 30, # wider window — 10-class models plateau longer before improving
cache = True,
augment = True,
project = 'runs',
name = 'predator',
# augmentation
hsv_h=0.015, hsv_s=0.7, hsv_v=0.4,
degrees=10, translate=0.1, scale=0.5,
fliplr=0.5, flipud=0.0, mosaic=1.0, mixup=0.1,
save=True, save_period=10, plots=True, verbose=True,
)The optimizer is also AdamW at lr0=0.001, matching the egg model. However, the patience for early stopping is set to 30 epochs rather than 20. With 10 classes, training can plateau for extended periods before breaking through to better performance as the model reorganises its feature representations. A patience of 20 would risk cutting the training short during one of those plateaus.
Training The Vision ModelsOpen the relevant notebook and run the cells in order.
Environment SetupTraining is preferred be done on a proper machine, not on the Arduino Uno Q. A machine with an NVIDIA GPU will give the fastest results — the egg model trains in about 30–40 minutes on a mid-range GPU; the predator model in around 2 hours. Apple Silicon (mps) is also well supported by Ultralytics. On CPU-only expect the egg model to take several hours and the predator model most of a day.
The first cell in each notebook installs all dependencies:
pip install ultralytics onnx onnxruntime onnxslimCheck which compute device is available before configuring:
import torch
print('CUDA available:', torch.cuda.is_available()) # NVIDIA GPU
print('MPS available:', torch.backends.mps.is_available()) # Apple Silicon
# Set DEVICE = 'cuda' | 'mps' | 'cpu' in the configuration cellA wrong path in data.yaml is the most common cause of a "dataset not found" error at the start of training. Verify it before running the training cell.
Starting from Pre-trained Weights Both notebooks initialise the model with:
model = YOLO('yolov8n.pt')
# Ultralytics downloads yolov8n.pt automatically on first use (~6 MB)Do not skip this by initialising from a YAML config file, which creates random weights. The difference in convergence speed and final accuracy between COCO pre-trained weights and random weights is substantial.
What to Watch During Training The most important number in the console output is `mAP50(B)` in the validation columns:
Epoch GPU_mem box_loss cls_loss dfl_loss mAP50(B) mAP50-95(B)
1/30 0.00G 1.345 1.563 1.206 0.864 0.517
5/30 0.00G 1.170 0.820 1.123 0.936 0.648
10/30 0.00G 1.147 0.748 1.122 0.957 0.663
22/30 0.00G 1.063 0.636 1.090 0.965 ← best
30/30 0.00G 1.014 0.583 1.058 0.960For the egg model, `mAP50` should cross 0.90 within the first 5–7 epochs. For the predator model, expect it to stay in the 0.60s for the first 20–30 epochs — that is normal. The predator model goes through a longer plateau before inter-class discrimination starts to sharpen.
What Gets Saved After Training
egg_detection_model/yolov8n_eggs/
├── weights/
│ ├── best.pt ← highest validation mAP50 — use this for export
│ └── last.pt ← final epoch — not guaranteed to be the best
├── results.csv ← per-epoch metrics
├── results.png ← training curves
├── confusion_matrix.png
├── PR_curve.png
└── F1_curve.pngAlways use best.pt for export. best.pt is updated every time validation mAP50 improves during training.
Validating the Best Checkpoint The validation cell in each notebook runs this automatically. You can also run it standalone:
from ultralytics import YOLO
model = YOLO('egg_detection_model/yolov8n_eggs/weights/best.pt')
metrics = model.val()
print(f'mAP@50 : {metrics.box.map50:.3f}')
print(f'mAP@50-95 : {metrics.box.map:.3f}')
print(f'Precision : {metrics.box.mp:.3f}')
print(f'Recall : {metrics.box.mr:.3f}')
# Predator model only — check per-class AP to find weak classes
CLASS_NAMES = ['Bobcat','Cougar','Coyote','Fox','Opossums',
'Raccoon','Rodent','Skunk','Snake','Weasel']
for name, ap in zip(CLASS_NAMES, metrics.box.ap50):
flag = ' ← needs more data' if ap < 0.50 else ''
print(f' {name:12s}: {ap:.3f}{flag}')Any predator class scoring below 0.50 AP@50 should be treated as a signal to collect more targeted images for that class before the next deployment.
Exporting To ONNX After training the model lives in a .pt file requiring PyTorch to load. On the Arduino Uno Q we want something lighter and framework-independent. ONNX — Open Neural Network Exchange - is a hardware-neutral format that describes the network graph in a standardised way.
We use ONNX because OpenCV's cv2.dnn module loads ONNX files directly with cv2.dnn.readNetFromONNX. The only software dependency on the board is OpenCV - no PyTorch, no TensorFlow, no separate runtime. OpenCV is already there for camera capture and image processing, so inference comes essentially for free.
The export cell in each notebook runs:
from ultralytics import YOLO
# Egg model
model = YOLO('egg_detection_model/yolov8n_eggs/weights/best.pt')
model.export(
format = 'onnx',
imgsz = 320, # must match the resolution used during training
opset = 12, # cv2.dnn supports up to opset 12 — use 13+ and it will fail
simplify= True, # fuses redundant ops, reduces file size no accuracy change
dynamic = False,# fix batch=1 — cv2.dnn does not support dynamic batch axes
)
# Output: egg_detection_model/yolov8n_eggs/weights/best.onnx
# Predator model
model = YOLO('predator_detection_model/predator/weights/best.pt')
model.export(format='onnx', imgsz=320, opset=12, simplify=True, dynamic=False)
# Output: predator_detection_model/predator/weights/best.onnxAudio ML Model For Distress DetectionThe audio model listens continuously to the coop through a USB microphone and classifies each 3-second window as either a distress call or normal chicken sound. When chickens are alarmed — predator approach, sudden intrusion, physical stress — they produce distinctive alarm calls that are acoustically separable from normal vocalisations. This model detects that signature and raises an alert even when the camera cannot see the threat: at night, through walls, or when the flock is out of frame.
The model does not identify which predator is present. It detects that the flock is distressed — that is the operationally relevant signal.
arecord 44.1 kHz
↓ resample_poly → 16 kHz, normalise
↓
[Stage 1] YAMNet — AudioSet class scorer
chicken/bird score > 0.05?
no → log "skipped — not chicken", discard clip
yes ↓
[Stage 2] spectral gating → log-mel (297 × 64) → Mel-CNN
distress probability → rolling vote window (2 clips)
avg ≥ 0.72 and window full → DISTRESS alertWhy two stages: The Mel-CNN alone cannot reject human speech, claps, or bangs — it was trained only on chicken audio. YAMNet, pre-trained on 521 AudioSet classes, filters non-chicken sounds before the CNN runs, eliminating false positives from incidental sounds at near-zero computational cost (~19 ms per window).
Why 3-second clips: Egg-song ("buck-buck-BAWK", post-lay vocalisation) spans ~2–3 s. With 1-second clips, the high-energy "BAWK" component alone looks like distress to the CNN. At 3 seconds the full rhythmic pattern is visible and the model can correctly classify egg-song as normal. Alarm calls also benefit — their rapid staccato repetition is more discriminable from isolated startles over a longer window.
Stage 1 - YAMNet Gate
YAMNet (Google, AudioSet pre-trained, 521 classes) runs on up to two 0.975-second windows of each 3-second clip and scores them against chicken-relevant AudioSet classes. Only the class scores are used — not the 1024-dim embedding.
Relevant class indices (from yamnet_class_map.csv):
| Group | Indices | Weight |
|-------|---------|--------|
| Chicken / Cluck | 94, 95 | 1.0× |
| Bird / Bird vocalization | 106, 107 | 0.6× |
| Livestock / Animal | 81, 67 | 0.3× |
| Rooster crowing (block) | 96 | hard-block at > 0.35 |GATE_THRESHOLD = 0.05 # very permissive — only blocks obvious non-chicken
chicken_score = float(scores[[94, 95]].max())
bird_score = float(scores[[106, 107]].max()) * 0.6
animal_score = float(scores[[81, 67]].max()) * 0.3
window_score = max(chicken_score, bird_score, animal_score)Rooster crowing (class 96) is hard-blocked at score > 0.35. The threshold is set high enough that alarm calls (which share spectral energy with crowing) are not blocked, while definite crowing is still rejected — the CNN was not trained on rooster calls and would false-positive on them.
If the gate blocks a clip (window_score < 0.05), the clip is discarded and the rolling vote window is not cleared - a brief non-chicken sound during a distress event should not reset accumulated evidence.
Stage 2 - Mel-CNN
Architecture from de Carvalho Soster et al. 2025, trained on ChickenLanguageDataset. Input is a log-mel spectrogram of the spectrally-gated clip.
SR = 16000
N_MELS = 64
N_FFT = 512
HOP_LENGTH = 160
FMIN, FMAX = 50, 8000
N_FRAMES = 297 # (48000 − 512) // 160 + 1
TARGET_LEN = 48000 # 3 seconds at 16 kHzPreprocessingchain:
# 1. Spectral gating — non-stationary noisereduce, full suppression
denoised = nr.reduce_noise(y=wav, sr=SR, stationary=False, prop_decrease=1.0)
denoised /= np.abs(denoised).max() # renormalise after gating
# 2. Log-mel spectrogram
mel = librosa.feature.melspectrogram(
y=denoised, sr=SR, n_fft=512, hop_length=160,
n_mels=64, fmin=50, fmax=8000, window='hann'
)
mel_db = librosa.power_to_db(mel, ref=np.max).T[:297]
# 3. Normalise to [-1, 1]
mel_db = 2.0 * (mel_db - mel_db.min()) / (mel_db.max() - mel_db.min()) - 1.0
# Shape fed to CNN: (1, 297, 64, 1)CNN layer stack (11 Conv2D blocks, valid padding, ELU + BatchNorm, no MaxPool):
| Layer | Filters | Kernel | Stride | Output (time × freq) |
|-------|---------|--------|--------|----------------------|
| Conv2D 1 | 64 | 3×3 | 1×1 | 295 × 62 |
| Conv2D 2 | 64 | 3×3 | 1×1 | 293 × 60 |
| Conv2D 3 | 96 | 4×4 | 2×2 | 145 × 29 |
| Conv2D 4 | 96 | 3×3 | 1×1 | 143 × 27 |
| Conv2D 5 | 128 | 5×3 | 3×2 | 47 × 13 |
| Conv2D 6 | 128 | 3×3 | 1×1 | 45 × 11 |
| Conv2D 7 | 128 | 4×3 | 2×2 | 21 × 5 |
| Conv2D 8 | 128 | 3×3 | 1×1 | 19 × 3 |
| Conv2D 9 | 128 | 3×3 | 1×1 | 17 × 1 ← freq collapses |
| Conv2D 10 | 128 | 3×1 | 2×1 | 8 × 1 |
| Conv2D 11 | 128 | 4×1 | 1×1 | 5 × 1 |
| Flatten | — | — | — | 640 features |
| Dropout(0.3) + Dense(1, sigmoid) | — | — | — | P(distress) |Rolling vote: last 2 CNN outputs are averaged. Alert fires when avg ≥ 0.72 and both slots are filled.
Dataset
ChickenLanguage Dataset -https://github.com/zebular13/ChickenLanguageDataset
Label mapping:
| Label | Folders |
|-------|---------|
| distress | aerial_alarm, ground_alarm, ouch |
| normal | eating, greeting, tidbitting_hen, where_is_everyone, disturbed_in_nest_box, `et_us_out, need_nest_box, hungry, noise |The distress set is small. Each raw clip is sliced into 3-second segments with 1-second hop. Augmentation is asymmetric to compensate:
| Class | Augmented copies per clip | Total per raw clip |
|-------|--------------------------|-------------------|
| distress | 12 | 13 |
| normal | 3 | 4 |Training
The model is trained end-to-end on the augmented mel spectrograms extracted in the previous step. No pre-trained weights are used — the CNN learns from scratch on ChickenLanguageDataset.
Optimizer: Adam at lr=1e-3. A higher initial learning rate than the egg/predator models (which use AdamW at 1e-4) is appropriate here because the mel-CNN is shallower and the input space is already well-structured by the mel spectrogram transform. The ReduceLROnPlateau callback halves the learning rate when validation loss plateaus, bringing it down automatically through training.
Loss — focal loss instead of binary cross-entropy:
FL(p_t) = −α(1 − p_t)^γ · log(p_t) γ=2.0, α=0.25Focal loss down-weights easy examples — clips the model classifies correctly with high confidence — and concentrates gradient signal on the ambiguous cases near the decision boundary. With a small and imbalanced dataset (distress is a minority class even after asymmetric augmentation), focal loss consistently outperforms standard BCE. Class weights are not used; focal loss handles the imbalance directly.
Early stopping monitors `val_auc` (not val_loss): AUC is threshold-independent — it measures the model's ability to rank distress clips above normal clips regardless of where the decision boundary sits. This matters because the deployment threshold is tuned separately on the validation set after training; optimising for AUC during training gives the best raw separability to tune against.
Split: 70% train / 15% validation / 15% test, stratified by label. The split is performed before augmentation to prevent leakage — augmented versions of training clips never appear in validation or test.
model.compile(
optimizer = tf.keras.optimizers.Adam(1e-3),
loss = focal_loss(gamma=2.0, alpha=0.25),
metrics = ['accuracy', AUC(name='auc'), Precision(), Recall()]
)
model.fit(
train_ds,
validation_data = val_ds,
epochs = 80,
callbacks = [
EarlyStopping(monitor='val_auc', patience=12, restore_best_weights=True),
ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, min_lr=1e-7),
ModelCheckpoint(MODEL_SAVE, monitor='val_auc', save_best_only=True),
]
)Threshold tuning: After training, the notebook scans thresholds 0.10–0.90 on the validation set and selects the value maximising F1. Use the printed best_thresh — not 0.5 — in inference.py. The optimal value typically sits above 0.5 because the focal loss with α=0.25 biases the model toward conservatism on distress predictions, so the threshold must be lowered from the naive 0.5 to recover recall.
Training results:
Best val_auc : 0.9843
Test accuracy: 94%
F1 (distress): 0.9486
Best threshold (val F1): 0.72TFLite export — float32 (primary):
conv = tf.lite.TFLiteConverter.from_keras_model(model)
with open('distress_cnn.tflite', 'wb') as f:
f.write(conv.convert())
# Size: ~4–5 MB float32INT8 export (optional): Set EXPORT_INT8 = True in the notebook's export cell to also produce distress_cnn_int8.tflite (~1.18 MB). Zero accuracy drop was observed on this dataset. Use INT8 only if disk or RAM is constrained — float32 is the default deployment target.
Feeder DesignThe mechanical design of the feeder remains mostly unchanged, but the actuation system has been improved. Instead of a servo motor, the feeder now uses a 28BYJ stepper motor, which allows more precise control over the feed dispensing process. By controlling the number of motor steps, the system can regulate the amount of food released during each feeding cycle.
The stepper motor is driven using a ULN2003 driver module, which amplifies the control signals from the main controller and drives the motor coils in sequence.
Heating SystemSmart Coop 2.0 includes an integrated heating system designed to maintain a stable temperature inside the coop during colder conditions.
Temperature inside the coop is monitored using the Thermo Modulino, which is connected to the Arduino UNO Q. Based on these real-time readings, the controller decides when heating is required.
The heating element is controlled through a IRLB8721PBF N-channel power MOSFETs, which will easily work with 3.3V logic-level of UNO Q. This means they can be driven directly from the Arduino UNO Q without needing additional driver circuitry
For demonstration purposes, a heating LED (bulb-like indicator) is used, which does not produce actual heat and is included only to safely simulate the heating system during testing.
In a real deployment, this can be replaced with a proper heating solution such as a PTC ceramic air heater, which offers safer and self-regulating behavior.
Note: When using higher wattage heaters (like PTC heaters), the current requirements can increase significantly. The existing buck converter may not be able to handle this load, so proper evaluation of current capacity and redesign of the power stage may be required.n of current capacity and redesign of the power stage may be required.
Power DesignThe Smart Coop 2.0 system is powered using a single 12V 10A power supply, High-power components such as the DC door motor and heating system are driven directly from the 12V line.
For low-power electronics, a TPS565201 buck converter steps down the 12V supply to a stable 5V output. This 5V is routed through a USB female module, then via a USB cable to the USB hub’s power delivery input, which powers the Arduino UNO Q, webcams, and microphone, while also supplying the ULN2003 stepper motor driver. The converter operates with high efficiency (around 85–90%), ensuring minimal heat generation and stable performance under varying loads.
The Arduino UNO Q and connected USB devices (webcams and microphone) are powered through a USB hub using a USB-A module connected to the power delivery line. This allows multiple peripherals to be powered from a single point while keeping the wiring neat and modular.
Note: It is not recommended to power the Arduino Uno Q through USB PD for long-term use. Please use a dedicated power supply for a stable and reliable setup.
To improve reliability and reduce wiring complexity, a custom PCB was designed using KiCad.
The PCB is designed in such a way that the Arduino UNO Q and the ULN2003 stepper motor driver can be directly plugged in using standard headers, making the system modular and easy to replace. The BTS7960 motor driver is used as an external module and is not included in the PCB design. Instead, it is connected to the main PCB using male-to-male jumper connections, simplifying the design.
Both the main PCB and the BTS7960 module are mounted on a dedicated PCB mounting plate, which helps keep the wiring organized and clean.
The board also integrates power regulation, connectors, and control circuitry into a compact layout, with screw terminals and headers for easy wiring. A solid ground plane and organized routing improve stability, while mounting holes allow secure installation onto a dedicated mounting plate.
These are the fabricated PCB'S.
This is a fully assembled one.
The entire body was 3D printed using the Bambu Lab A1 with PLA, providing a good balance of strength, cost, and print reliability.
We started by assembling the door section. M4 × 6 mm threaded inserts were added to the front parts to create reusable mounting points. The motor mount was then installed, followed by the side mounts to hold the rails using M4 screws, along with the limit switch mounts and switches. The door was prepared by inserting linear ball bearings, and both door panels were placed onto the rails. Finally, the belt was installed to connect the motor and complete the mechanism.
Then, the front section was attached to the back section using the steel rods.
Then, the PCB mounting plate was installed, starting with the DC jack, followed by the PCB, motor driver, and USB hub.
Next, the feeder section was assembled by positioning and securing the stepper motor and feeding mechanism (Archimedes screw).
After assembling the feeder, the heating LEDs were mounted using 3M tape.
Then, the front and back acrylic panels were attached, along with mounting the webcams.
Then, the assembly was completed by placing both roof sections.
You can understand all the screw sizes and placements by referring to the 3D model. Most mounts use M4 screws along with threaded inserts. For the door mechanism, a GT2 timing belt is used along with LM8UU linear ball bearings on 8 mm rods for smooth movement, while 6 mm rods are used to combine and align the structural parts.Building Dashboard With Arduino IoT Cloud
The Smart Coop runs autonomously — the door opens at dawn, the heater warms the coop on cold mornings, and predator detection is always on. But autonomous doesn't mean unobservable. The Arduino Cloud dashboard gives two things:
- Telemetry up — live temperature, humidity, egg count, predator alerts, door state.
- Commands down — override switches for the door, the heater, and the feeder, from anywhere with a browser or the Arduino IoT Remote app.
Both flows run through the same Cloud Brick container on the Uno Q's Linux side. Let's start by connecting Arduino Uno Q with the cloud.
Register the UNO Q
The UNO Q is registered in Arduino Cloud as a manual device, the same category used for ESP32 and Nano ESP32 boards. This is the single most important step, because it generates the credentials the Cloud Brick will need.
Follow these steps
- Go to https://cloud.arduino.cc → Devices tab.
- Click Add Device.
- Under Manual device, select Arduino UNO Q → Continue.
- Name the device (e.g., UnoQ) → Continue.
Arduino Cloud now generates your credentials:
- Device ID — a public identifier, looks like a UUID.
- Secret Key — the authentication token.
Click Download PDF before closing the dialog. The Secret Key is shown exactly once and cannot be recovered. If you lose it, the only fix is to delete the device and re-register it, which invalidates the previous credentials everywhere they were used.
After this step, the device appears in your Devices list as OFFLINE. It will stay offline until the Cloud Brick on the UNO Q presents the matching credentials
Create the Thing and Variables
A Thing is the logical container that binds your device to a set of named cloud variables. Each variable has a type, a permission, and an update policy - these define what the dashboard can show and what the device is allowed to do with them.
Create the Thing
- In Arduino Cloud → Things tab → Create Thing.
- Name it (e.g., SmartCoop).
- Under Associated Device, click Select Device and pick the UNO Q you registered.
Add cloud variables
Click Add Variable for each one you want. The important fields are:
- Name: The exact string you'll reference from Python (cloud["temperature"]). Case-sensitive.
- Type: Data type. Use semantic types (Temperature, Percentage, Velocity) where possible — dashboards auto-pick sensible widgets and units. Otherwise, use float, int, bool, and String.
- Permission: Read Only = device publishes, dashboard displays. Read & Write = dashboard can also push values down to the device (for actuators, setpoints).
- Update Policy: On change = publish only when value changes (efficient, recommended for sensors). Periodically = publish on a fixed interval regardless.
Build the Dashboard
- A Dashboard is the visual layer. It consists of widgets, each linked to a variable on a Thing.
- Open the Dashboards tab → Build Dashboard.
- Name it (e.g. Smart Coop - Dashboard).
Click Add → choose a widget type. Useful ones:
- Gauge / Value — current reading
- Chart — time-series history (added automatically for numeric variables)
- Switch / Push Button — for Read & Write variables that drive actuators
- Messenger — free-form string updates, good for debug / status
For each widget, click Link Variable → pick the Thing → pick the variable. Save.
Widgets will sit idle, showing — or 0 until the device connects and publishes. Don't worry — that's expected at this stage.
Twilio For SMS AlertsWe're using Twilio to send SMS alerts from the coop to your phone when something urgent happens (predator detected) - so you're notified immediately instead of having to watch the dashboard.
Follow these steps to setup your twilio for our project:
1. Create a Twilio account at https://www.twilio.com (trial is fine to start).
2. Buy a phone number (or use the free trial number — note trial accounts can only send SMS to verified numbers, so verify your own phone first under Phone Numbers → Verified Caller IDs).
3. From the Twilio Console dashboard, copy:
- Account SID — starts with AC…
- Auth Token — click to reveal
- Twilio phone number — in E.164 format, e.g. +15551234567
Verify your region is supported for SMS. Twilio's Geo Permissions page lists allowed destinations — some countries are disabled by default and need to be enabled in the console (Messaging → Settings → Geo Permissions).Conclusion
Smart Coop 2.0 brings together automation, electronics, and AI to create a smarter and more reliable chicken coop system. By improving the mechanical design, simplifying the electronics with a custom PCB, and adding vision and audio-based monitoring, the system becomes both powerful and easy to replicate.
The modular design makes it easy to assemble, maintain, and customize based on different needs. With features like automated door control, precise feeding, environmental monitoring, and intelligent detection, this system helps reduce manual effort while improving safety and efficiency.
Overall, Smart Coop 2.0 demonstrates how practical IoT and AI solutions can be applied to everyday problems, making poultry management more convenient and accessible.




















Comments