Currently, the main roadblock associated with COVID-19 operations is the inability to perform large-scale testing. Agile models that promote cost-effective testing and large-scale deployments can prove to have distinct advantages. To solve this problem, we propose an AI-based approach for analyzing audio signal patterns in cough recordings of people to diagnose symptoms of COVID-19. This idea represents a DS-CNN based model. The AI engine runs on an ARM Cortex microcontroller that provides real-time inference. The model is trained on an external CPU/GPU and deployed on a low power board. The microphone is used to capture the real-time recording of coughing sounds, and the inference is made on the OLED screen connected to it. By analyzing audio samples with an AI engine running on the edge, the model can return to a preliminary diagnosis within a few seconds. Since coughing is one of the early symptoms of Coronavirus infection, our solution is designed to drive the initial clinical COVID-19 test so that preliminary screening can be performed before the actual test. AI-on-the-edge technology reduces the delays that can occur when deploying models on the cloud. Consecutively, it enables us to solve problems in areas with poor network connections. In short, our mechanism is aimed at the front line to help expand the scope of clinical testing.Proposed Plan
Our proposed method reduces the limitations of the resources by using the tools and frameworks that have been developed for the very purpose of giving robustness to a prototype or an idea.
The flow of the idea starts with the acquisition of human cough sounds. The 2-5 seconds of the captured audio signal contains enough information to give a conclusion of the person's respiratory health. The audio signal is then processed by the DSP block of Arm Cortex-M microprocessor whose main purpose is to clean the audio signal and remove background noise. The extraction of features from the audio signal is then done by using Fourier transforms. This helps us to achieve the Mel-Spectrogram of the raw audio signal.
A trained neural network model is optimized and tuned, which is used for running the inference on the processed Mel-Spectrograms. The spectrograms are fed into the neural network running on the Arm Cortex-M microprocessor using the CMSIS-NN inference engine. The inference obtained is being displayed locally, and further conclusions can be derived from it.
We are implementing our project from the cough sounds gathered from the ESC-50 dataset for training the model to distinguish between a cough and non-cough sounds. We followed the same flow of converting the raw audio signals into Mel-Spectrogram images and then feed forwarding it into the neural network.
We trained our DS-CNN based model on a database of respiratory sounds of various respiratory diseases viz; URTI, Asthma, COPD, LRTI, Bronchiectasis, Pneumonia, Bronchiolitis. This gives us the confidence to train our model on the COVID-19 database on respiratory or cough sounds to achieve the final stage.
The proposed architecture uses very little infrastructure when compared to the usage of cloud technology. The robustness of its deployment makes it flexible to be integrated with some other existing system. Running all the processes on edge removes the limitation of network connectivity and makes this project plug and play.
This project is based on two main parts signal processing of audio signals and deploying the AI engine on edge. The main constraint of this project is to not use the cloud technologies and thus perform all the processes using the fundamental tools available on hardware constrained devices like a microprocessor.
Signal Processing on the Edge
We implemented the Mel Spectral analysis on our audio signals using the Librosa library. We performed a short-time Fourier transform on the raw cough sounds and then converted the amplitude spectrogram into a dB-scaled spectrogram. We then plotted the Mel-Spectrogram in time and frequency axis
For implementation on edge, the signal processing will be carried out by the CMSIS-DSP library that composes common signal processing functions for use on Cortex-A and M processor-based devices.
AI Engine on Edge
Finding out the best neural network model is an important factor for edge inference. The chosen model should have a minimum memory footprint and also decent accuracy when compared to the compute expensive models. For developing neural network solutions on the Cortex-M hardware constrained model search is the first and most determining step for the rest of the workflow.
A compact model that fits within the memory of the Cortex-M system and fewer operations to achieve real-time performance. We chose the Mobile-Net-V2 model based on DS-CNN for this project as it is designed for memory-constrained devices like microcontrollers.
The next step is the quantization of the chosen model. This directly impacts on the model size and performance. The trained model weights have 32-bit floating-point numbers, which can represent a very wide range of values, both very big and very small numbers. We can reduce the unwanted range of values into fixed point 8 numbers by the compression technique, quantization. This reduces the range of represented numbers in the weights of the model and thus impacting on the model size and the performance without losing the accuracy.
Our model will be deployed on an edge device and won't be executing a high-level scripting language like python but a faster low-level language like C on which all our embedded applications have been carried out so far. Model Translation is compiling the python-based program into a custom script, which will run on the inference engine and efficiently use the limited resources. Arm NN translates the trained model to the code that runs on Cortex-M cores using CMSIS-NN functions.
The transformed model will now be deployed on the edge device. We use the CMSIS-NN inference engine to deploy our model on the edge. This improves the throughput by 4.6X and energy efficiency by 4.9X
This further improvement on the inference time makes it a faster and robust system that can be deployed for its used case where the testing has to be done on a massive scale.
The processed Mel-Spectrogram images will now be fed into the deployed model. The inference will be given in real-time, and the total turnaround time is expected to be less than a second from the time the user finishes recording the cough audio.
We implemented the project in two separate phases. Presently we are not in the possession of COVID-19 cough sounds or respiration sounds.
We trained a model using the cough sounds and implemented on the STM32 edge device.
We trained a separate model for the respiratory sound database to detect several respiratory diseases.
We extracted the cough sounds from ESC50 database. We used the Edge Impulse daemon to train our neural network model on cough sounds after extraction of MFCC features from the raw audio signals. We deployed the binary file into the DISCOVERY-IOT1A board for real-time inference on the edge.
Our hypothesis is that COVID-19 cough sounds will have unique latent features that can be easily extracted using MFCC. We can retrain the model using the same architecture on the COVID-19 dataset and deploy on the edge.
Detection of other respiratory diseases
We trained the same model using the respiratory sound database to classify several respiratory diseases.
We achieved a decent validation accuracy after following the same procedures as we did for the cough detection. We used the same model architecture and similar signal processing techniques to extract the features.
A similar technique can be applied for COVID-19 detection through sounds.
The initial objectives were successfully implemented in the prototype model of the project. A lot of research has to be done to put all the things in one place and achieve all the targets of this project. Building the first prototype will help us understand the constraints of this project in a better way. Unavailability of the COVID-19 dataset on audio signals is the main hindrance to this project right now.
We presently considered training our AI engine on similar respiratory diseases so as to make our project ready to be tested on COVID-19 data.
Considering the fact that this project doesn't need any network connectivity, villages and small towns can opt for this as a better alternative rather than the existing body temperature test which is not at all conclusive for a patient to be COVID-19 positive or not. Setting it up as a stand-alone device at public places can be useful in effective pre-screening of the masses without putting the lives of frontline healthcare workers in danger.
The future prospects of this project will be to take care of the security of the database and robustness of the whole project for long term deployment.