An Artificial Voice for the Voiceless
A thin, wearable skin patch uses graphene-based strain sensors and deep learning to give a voice to those with speech impairments.
Speech is a fundamental aspect of human communication, allowing individuals to convey their thoughts, feelings, and ideas to others. However, for those with a speech impairment, their ability to communicate effectively may be significantly impacted. It can make it difficult to engage in social interactions, participate in the workforce, and even perform basic tasks such as ordering food at a restaurant or making a phone call.
According to the American Speech-Language-Hearing Association, over 40 million Americans have communication disorders, with speech impairments being one of the most common. These impairments can range from stuttering to more severe conditions such as aphasia or dysarthria, which can affect the ability to produce or understand speech. Many neurological diseases, such as stroke, cerebral palsy, Parkinson’s disease and dementia, also frequently lead to speech impairments.
A number of efforts have sought to aid those that have difficulties with verbal communication in recent years. Microphone arrays, as well as individual microphones directly attached to the body, have been used in many devices to gather subtle sounds uttered by the vocally disabled, but these systems are large and/or intrusive, which makes them impractical for daily use. Likewise, cameras have been used to capture images of the face for lip-reading devices with great success. However, keeping a camera pointed at one’s face is obtrusive and raises many privacy-related concerns.
As speech is formed by the actions of the throat and facial muscles, a team led by researchers at Tsinghua University decided to explore an approach that measures these movements and translates them into speech. Towards this end, they designed and developed a small, wearable patch that they call an artificial throat. When worn on the throat, it captures muscular motion and sound vibration data with onboard sensors. That data is then analyzed by machine learning algorithms that were trained to translate it into speech.
To create the artificial throat, laser-scribing hardware was leveraged to convert a polyimide film into flexible graphene strain sensors. Such sensors can maintain close contact with the skin of the larynx and sense the minute vibrations caused by muscle movements and sound waves. This design is also resilient against interference from background noise of all sorts, and will not inadvertently pick up the voice of nearby speakers.
Measurements collected from the sensors were used to train a deep neural network. Models that have previously proved themselves to produce excellent results in computer vision tasks were retrained for the speech recognition task so that the knowledge already contained in them could be utilized. This approach was found to be highly effective, producing very accurate translations of the movement data acquired by the skin patch.
A real-world trial of the system was conducted with the help of a laryngectomy patient. The neural network trained via transfer learning achieved an average speech recognition accuracy rate of 81.25%, which is reasonably good. However, the team wanted to improve this result, and set about doing so by creating an additional AlexNet model and linking it with the initial neural network. This ensemble model reached an impressive 91% accuracy level for the laryngectomy patient.
The results achieved in this study show that this new wearable system may be an effective tool to assist people with vocal disorders in recovering their ability to communicate. The researchers also see future applications for their sensors in intelligent home health-monitoring systems, wearable electronics, and cryptographic security.