With the hustle and hubbub that constantly surrounds us, it can be easy to miss things that are important in our every days lives. Events such as a dog barking, needing to be let inside, or a baby alone crying in their room can be lost in constant chaos that seems to surround us. While increased awareness is great, too much awareness can further envelope us and cause us to miss the things that matter. We aim to use Machine Learning and the Neo Sensory Buzz to help make sense our auditory surrounding and make us aware of the things that make sense in our lives.
This project consists of two components; the sound inference base (Raspberry Pi 4b + Seeed RESPEAKER) as well as the Buzz haptic bracelet. The sound inference base will be used to collect audio, prepare audio for classification, classify the audio signal and communicate the results to Buzz bracelet via BLE. The Buzz bracelet provides 4 independent motors that can be controlled via BLE. The 4 motors are used to indicate the results of the audio inference classification.
Sound classification is achieved using a pre-trained model call YAMNet. YAMNet is a pretrained deep net that predicts 521 audio event classes based on the AudioSet-YouTube corpus. As classification are made, the sound infernce base translates the classifications in the following, configurable ways:
- Confidence to Motor - 4 separate audio events (such as Speech, Dog Bark, Baby Cry and Doorbell) are configured to the 4 different Buzz motors. The confidences generated by the YAMNet model map directly to the motor intensity by the following (Ci * max_intensity where Ci is (0.0, 1.0) and represents the confidence for ith event class, max_intensity is (0, 255)).
- Sinusoidal - Generates a sinusoidal wave pattern that is distributed over all 4 Buzz motors and varies in amplitude.
- MEL Spectrogram - If a configured audio event's classification is above the classification threshold, the MEL spectrogram generated by the YAMNet model is mapped to the 4 Buzz motors as follows. The generated MEL spectrogram is the result of an STFT that uses 64 melodic frequency bins logarithmically ranging from 125 Hz to 7500 Hz and covering a ~ 1 sec span. The MEL bins are linearly mapped (bins 1-16 to motor 1, bins 17-32 to motor 2, etc) to the 4 Buzz motors. In this case, the motor along with the motor's intensity will represent the MEL bin with the greatest amplitude.
The Buzz haptic bracelet has a BLE (Bluethooth Low Energy) interface that can be used to control Buzz from a reasonable range (we've had good results up to 40+ meters). This allows us to place the sound inference base really anywhere inside the house or even use multiple sound inference bases (we have not prototyped this). The sound inference base attempts to hold the the BLE connection indefinitely. If the connection is lost, a re-connection attempt is made. This allows the bracelet to move in and out of BLE range.DemonstrationExamples
The following plot shows inferences generated by the YAMNet model as well as the corresponding Buzz intensities generated by the confidence to Motor algorithm described above. The sounds samples where pulled from freesound.org
An example of the MEL spectrogram format