Published May 20, 2025 © Apache-2.0

SmartListener - Ambient Sound Classifier using AI

The Smart Listener is a machine learning-based system that classifies ambient sounds in a home environment. Using the Infineon PSoC 6 AI Kit

IntermediateProtipOver 3 days510

Runner-Ups Overall

Getting Edgy with Machine Learning

SmartListener - Ambient Sound Classifier using AI

Things used in this project

Hardware components

Infineon PSOC™ 6 AI Evaluation Kit (CY8CKIT-062S2-AI)

Li-Ion Battery 1000mAh

Software apps and online services

Infineon ModusToolbox™ Software

Infineon DEEPCRAFT

MQTT

FreeCAD

Hand tools and fabrication machines

3D Printer (generic)

Story

SmartListener is a low-cost, Wi-Fi-enabled edge AI device that listens for real-world sounds — like a baby crying, dog barking, or glass breaking — and sends MQTT alerts when specific sound events are detected. It uses a PSoC6 AI Kit running a lightweight ML model trained using DeepCraft Studio and features a custom 3D-printed enclosure for deployment in any room.

Key Features:

Edge AI: Runs TensorFlow Lite models directly on the PSoC™ 6.
5+ Sound Classes: Detects baby_crying, fire_alarm, glass_breaking, footsteps, and dog_barks.
MQTT Integration: Publishes JSON alerts to any broker (HiveMQ/Mosquitto).
Low Latency: Real-time processing with minimal hardware.

Dataset Preparation (ESC-50)

ESC-50 is a free, well-labeled, and widely used dataset in the field of environmental sound classification (ESC). We used custom Python scripts to split, resample, and convert the dataset to match the input format expected by DeepCraft Studio.

🧾 Script 1: Split ESC-50 by Class

import pandas as pd
import shutil
import os

# Paths
ESC50_CSV = "./meta/esc50.csv"
ESC50_AUDIO_DIR = "./audio/"
OUTPUT_DIR = "./split_data/"

# Load CSV
df = pd.read_csv(ESC50_CSV)

# Create the base output directory if it doesn't exist
os.makedirs(OUTPUT_DIR, exist_ok=True)

# Loop through each unique category in the dataset
for category in df['category'].unique():
    # Create a folder for each category inside the output folder
    category_folder = os.path.join(OUTPUT_DIR, category)
    os.makedirs(category_folder, exist_ok=True)

    # Filter the filenames for the current category
    category_files = df[df['category'] == category]['filename'].tolist()

    # Copy each file into its respective category folder
    for file_name in category_files:
        src_path = os.path.join(ESC50_AUDIO_DIR, file_name)
        dst_path = os.path.join(category_folder, file_name)
        shutil.copy(src_path, dst_path)

    print(f"Copied {len(category_files)} files to folder: {category_folder}")

print("Finished splitting the dataset!")

🧾 Script 2: Convert to WAV + Resample to 16 kHz

from pydub import AudioSegment
import os

# ---- SETTINGS ----
input_folder = "raw_audio_files"   # Folder where your audio files are
output_folder = "processed_audio"  # Where converted files will be saved

target_sample_rate = 16000  # 16 kHz
target_format = "wav"       # DeepCraft needs WAV

# ---- FUNCTION ----
def process_audio(file_path, output_path):
    audio = AudioSegment.from_file(file_path)
    audio = audio.set_frame_rate(target_sample_rate)
    audio = audio.set_channels(1)  # Mono
    audio.export(output_path, format=target_format)

# ---- RUN ----
if not os.path.exists(output_folder):
    os.makedirs(output_folder)

for root, dirs, files in os.walk(input_folder):
    for file in files:
        if file.lower().endswith((".wav", ".mp3", ".flac", ".ogg", ".m4a")):
            input_path = os.path.join(root, file)
            output_path = os.path.join(output_folder, os.path.splitext(file)[0] + ".wav")
            print(f"Processing {input_path} -> {output_path}")
            process_audio(input_path, output_path)

print("\n✅ All files processed!")

These ensure each audio file is at 16 kHz, mono, 16-bit PCM — required for PSoC6 inference.

DEEPCRAFT Studio

DEEPCRAFT™ Studio, by Infineon Technologies, is an end-to-end platform for developing and deploying Edge AI applications. It supports audio, radar, time-series, and computer vision data. With an intuitive, graph-based interface, it simplifies the machine learning workflow, from data collection to deployment. The platform also offers open-source Starter Models, enabling developers to quickly begin AI projects and deploy them to edge devices.

DEEPCRAFT Studio : Data Labeling

Before being able to train the model or even performing preprocessing, the data need to be imported to DEEPCRAFT studio so we can start be labeling it. We created a Generic Graph UX project where we add a "wav file" node so we can read the training data.

🎯 Selected Labels

Baby crying
Dog barking
Glass breaking
Door knock
Background noise ("unknown")
other events (unlabeled)

DEEPCRAFT Studio : Data Import

After labeling all that data that will be used to train the model, the data need to be imported to the DEEPCRAFT classification project. Then we need to split the data into three categories Training Dataset, Validation Dataset, Test Dataset.

DEEPCRAFT Studio : Preprocessing

The raw audio recordings were preprocessed using a carefully crafted feature extraction pipeline designed in DeepCraft Studio. The input audio clips were mono-channel PCM WAV files sampled at 16 kHz, and the preprocessing chain was optimized for both real-time inference and classification accuracy.

Sliding Window : This step segments the incoming audio stream into overlapping windows of ~32 ms for frame-based processing.
Hann Smoothing: Reduce spectral leakage by applying a Hann window.
Real Discrete Fourier Transform (RDFT) : Converts the time-domain signal into the frequency domain using a real-valued FFT.
Frobenius Norm : Reduces the FFT output into a single power spectrum per frame using Frobenius norm.
Mel Filterbank : Converts the power spectrum to a perceptually inspired Mel scale representation — compressing high-dimensional FFT into meaningful 30-bin frequency features.
Clip : Ensures numerical stability by clipping extreme values
Logarithm : Applies log-compression to simulate human loudness perception and further normalize dynamic range.
Final Sliding Window (Spectrogram Block) : creates spectrogram "snapshots" that are passed into the neural network for classification.

DEEPCRAFT Studio : Model Training

To train the audio classification model, I used DeepCraft Studio’s modelwizard, which streamlines neural network architecture selection and tuning. The model was optimized for deployment on the PSoC6 AI Kit with a focus on size, latency, and generalization.

DEEPCRAFT Studio : Model Evaluation

The trained Conv1DLSTM model achieved 76.13% accuracy and a comparable F1-score of 76.08%, demonstrating solid performance for a lightweight architecture. The confusion matrix reveals strengths in classifying distinct sounds like fire (10.25% correct) and baby_crying (11.32% correct), though challenges arise with ambiguous or rare classes. For example, glass_breaking was correctly identified only 1.27% of the time, often misclassified as unknown (1.91%), likely due to limited training examples or subtle acoustic features. The model’s tendency to label 39.46% of samples as unknown suggests conservative confidence thresholds or noise sensitivity, while confusion between dog and baby_crying (1.37%) hints at overlapping frequency patterns.

To improve robustness, future work could focus on data augmentation for underrepresented classes (e.g., synthetic glass_breaking samples) and adjusting confidence thresholds to reduce unknown predictions. Despite these challenges, the model outperforms a baseline CNN (~68% accuracy), making it a promising candidate for edge deployment with further optimization.

DEEPCAFT Studio : Model Generation

The DeepCraft Studio was configured to generate optimized C code (model.h/model.c) for deploying the audio classification model on Infineon's PSoC 6 microcontroller. Targeting the dual-core Cortex-M4/M0+ architecture, the setup enables CMSIS-accelerated floating-point operations while maintaining full precision (Float32) without quantization.

Deployment

Deploying a machine learning model on the PSoC™ 6 AI Kit involves integrating the trained model into an embedded application using Infineon's development tools like ModusToolbox™ and DEEPCRAFT™ Studio. The model is embedded into the firmware running on the PSoC 6’s dual-core ARM Cortex-M4F/M0+ microcontroller. Audio data from the onboard peripherals MEMS microphone is collected in real time and passed to the model for inference directly on the device, enabling low-latency, on-device AI processing.

To enable communication, the PSoC 6 AI Kit uses the AIROC™ CYW43439 module for Wi-Fi and MQTT connectivity. By integrating the MQTT client library within application, the kit can publish inference results to an MQTT broker, making it ideal for edge-to-cloud use cases. This setup allows our application to perform inference locally and send results wirelessly, achieving an efficient balance between edge intelligence and connected communication.

Deployment : PSoC 6 AI Kit Overview

The PSoC™ 6 AI Evaluation Kit (CY8CKIT-062S2-AI) by Infineon Technologies is a compact development platform for edge AI applications. It features a dual-core ARM Cortex-M4F/M0+ microcontroller, integrated sensors like a 6-axis motion sensor, radar, and a MEMS microphone, as well as wireless connectivity via the AIROC™ CYW43439 Wi-Fi/Bluetooth® combo module. Compatible with DEEPCRAFT™ Studio, the kit supports machine learning model training and deployment, making it ideal for applications like smart home devices, wearables, and industrial monitoring.

Deployment : ML Model Inferencing

The Project contains an OS task that will be used to read the PDM (mic) input and pass it to the model in order to do the preprocessing and the classification.

/* Initialize the audio_buffer to zeroes and read data
* from the pdm mic into it */
audio_count = AUDIO_BUFFER_SIZE;
memset(audio_buffer, 0, AUDIO_BUFFER_SIZE * sizeof(uint16_t));
result = cyhal_pdm_pcm_read(&pdm_pcm, (void *) audio_buffer, &audio_count);
/* Convert integer sample to float and pass it to the model */
sample = SAMPLE_NORMALIZE(audio_buffer[i]) * DIGITAL_BOOST_FACTOR;


result = IMAI_enqueue(&sample);

The inferencing task will check the result for the model and do some debouncing before sending the detected label to the MQTT task to be published.

if (best_label == last_seen_label)
{
    debounce_counter++;
}
else
{
    debounce_counter = 1;
    last_seen_label = best_label;
}
// Confirm label after debounce threshold
if (debounce_counter >= DEBOUNCE_THRESHOLD && confirmed_label != best_label)
{
    confirmed_label = best_label;
    debounce_counter = 0;  // Reset to avoid retriggering
    // Filter out "unlabeled" and "unknown"
    if (confirmed_label != 0 && confirmed_label != 6)
    {
        printf("✅ Debounced Output: %-30s\r\n", label_text[confirmed_label]);
        // 🔔 Send to MQTT or trigger action here
        publisher_q_data.cmd = PUBLISH_MQTT_MSG;
        publisher_q_data.data = (char *)label_text[confirmed_label];
        xQueueSend(publisher_task_q, &publisher_q_data, 0);

    }
    else
    {
        printf("⛔ Ignored Label (Unlabeled/Unknown): %s\r\n", label_text[confirmed_label]);
    }
}

Deployment : MQTT Communication

The MQTT communication used the following configurations to select the broker (HiveMQ) and the topic used by the publisher

#define MQTT_BROKER_ADDRESS "broker.hivemq.com"
#define MQTT_PORT 1883

#define MQTT_PUB_TOPIC "SmartListener"

Once a label is confirmed by the model, it will be sent to MQTT task to publish it.

/* Wait for commands from other tasks and callbacks. */
if (pdTRUE == xQueueReceive(publisher_task_q, &publisher_q_data, portMAX_DELAY))
{
    .
    .
    .
    case PUBLISH_MQTT_MSG:
    {
        /* Publish the data received over the message queue. */
        publish_info.payload = publisher_q_data.data;
        publish_info.payload_len = strlen(publish_info.payload);
        printf("\nPublisher: Publishing '%s' on the topic '%s'\n",
        (char *) publish_info.payload, publish_info.topic);
        result = cy_mqtt_publish(mqtt_connection, &publish_info);
        
        break;
    }
}

Deployment : Final Testing

After flashing the Board with code, and connecting the Li-Ion battery we can put the device the custom-made enclosure and let it listen to the surrounding environment to start detecting the pre-defined sounds and communicate this the cloud.

We can check what the board is detecting using any MQTT service. In my Case I use "IoT MQTT Panel" application to be able know what is happening around the board when I'm out off Home.

Custom parts and enclosures

Sketchfab still processing.

Code

import pandas as pd
import shutil
import os

# Paths
ESC50_CSV = "./meta/esc50.csv"
ESC50_AUDIO_DIR = "./audio/"
OUTPUT_DIR = "./split_data/"

# Load CSV
df = pd.read_csv(ESC50_CSV)

# Create the base output directory if it doesn't exist
os.makedirs(OUTPUT_DIR, exist_ok=True)

# Loop through each unique category in the dataset
for category in df['category'].unique():
    # Create a folder for each category inside the output folder
    category_folder = os.path.join(OUTPUT_DIR, category)
    os.makedirs(category_folder, exist_ok=True)

    # Filter the filenames for the current category
    category_files = df[df['category'] == category]['filename'].tolist()

    # Copy each file into its respective category folder
    for file_name in category_files:
        src_path = os.path.join(ESC50_AUDIO_DIR, file_name)
        dst_path = os.path.join(category_folder, file_name)
        shutil.copy(src_path, dst_path)

    print(f"Copied {len(category_files)} files to folder: {category_folder}")

print("Finished splitting the dataset!")

from pydub import AudioSegment
import os

# ---- SETTINGS ----
input_folder = "raw_audio_files"   # Folder where your audio files are
output_folder = "processed_audio"  # Where converted files will be saved

target_sample_rate = 16000  # 16 kHz
target_format = "wav"       # DeepCraft needs WAV

# ---- FUNCTION ----
def process_audio(file_path, output_path):
    audio = AudioSegment.from_file(file_path)
    audio = audio.set_frame_rate(target_sample_rate)
    audio = audio.set_channels(1)  # Mono
    audio.export(output_path, format=target_format)

# ---- RUN ----
if not os.path.exists(output_folder):
    os.makedirs(output_folder)

for root, dirs, files in os.walk(input_folder):
    for file in files:
        if file.lower().endswith((".wav", ".mp3", ".flac", ".ogg", ".m4a")):
            input_path = os.path.join(root, file)
            output_path = os.path.join(output_folder, os.path.splitext(file)[0] + ".wav")
            print(f"Processing {input_path} -> {output_path}")
            process_audio(input_path, output_path)

print("\n✅ All files processed!")

Credits

Mohamed Ali Bedair

10 projects • 5 followers

Engineer/Maker

Comments

Awards

Runner-Ups Overall

Getting Edgy with Machine Learning

SmartListener - Ambient Sound Classifier using AI

Things used in this project

Hardware components

Software apps and online services

Hand tools and fabrication machines

Story

Key Features:

Dataset Preparation (ESC-50)

DEEPCRAFT Studio

DEEPCRAFT Studio : Data Labeling

DEEPCRAFT Studio : Data Import

DEEPCRAFT Studio : Preprocessing

DEEPCRAFT Studio : Model Training

DEEPCRAFT Studio : Model Evaluation

DEEPCAFT Studio : Model Generation

Deployment

Deployment : PSoC 6 AI Kit Overview

Deployment : ML Model Inferencing

Deployment : MQTT Communication

Deployment : Final Testing

Custom parts and enclosures

smartlistener_enclosure_OrOwUCStiQ.stl

Code

ESC-50 Split script

Preprocessor script

Full project repository

Credits

Mohamed Ali Bedair

Comments

Awards

Embed the widget on your own site

SmartListener - Ambient Sound Classifier using AI

SmartListener - Ambient Sound Classifier using AI

Things used in this project

Hardware components

Software apps and online services

Hand tools and fabrication machines

Story

Key Features:

Dataset Preparation (ESC-50)

DEEPCRAFT Studio

DEEPCRAFT Studio : Data Labeling

DEEPCRAFT Studio : Data Import

DEEPCRAFT Studio : Preprocessing

DEEPCRAFT Studio : Model Training

DEEPCRAFT Studio : Model Evaluation

DEEPCAFT Studio : Model Generation

Deployment

Deployment : PSoC 6 AI Kit Overview

Deployment : ML Model Inferencing

Deployment : MQTT Communication

Deployment : Final Testing

Custom parts and enclosures

smartlistener_enclosure_OrOwUCStiQ.stl

Code

ESC-50 Split script

Preprocessor script

Full project repository

Credits

Mohamed Ali Bedair

Comments

Awards

Related channels and tags