Hey! Who Said That?

Direction of voice estimation algorithms tell smart speakers if we are talking to them without the need for wake words.

Nick Bild
3 years agoMachine Learning & AI
(📷: K. Ahuja et al.)

As voice becomes a more pervasive means of communicating with our devices, forward-looking individuals are working to make that experience more natural. After all, the way we presently talk to our smart speakers is hardly the way we talk with other people. We know, for example, that we have to phrase our request in a certain way so as to get a sensible response. We also have to start with a specific wake word to get the attention of the device.

A group of researchers at Carnegie Mellon University is working on a technique that may eliminate the need for wake words. The approach gives smart devices the ability to determine the direction from which a voice is coming, and thereby infer whether or not the speaker was talking to them. Not only is this more natural, but consider what it would be like if you had a dozen different devices, each with its own wake word to remember.

In contrast to direction of arrival approaches, in which the originating point of a voice is calculated, this new direction of voice method determines the direction along which a voice was projected. By leveraging the knowledge that the distribution of human speech frequencies varies by spoken angle, the team designed a machine learning Extra-Trees classifier with sklearn. Data was collected with a ReSpeaker USB 4-channel microphone made by Seeed Studio, and processing was done on a MacBook Pro. The inference algorithm is lightweight and can run on-device without privacy-compromising processing in the cloud.

The system was evaluated across people, time, utterances, rooms, device placement, user position and spoken angle. In these tests, accuracy greater than 93% was achieved. While this is a good result, it is not quite consumer ready, where greater than 99% accuracy has come to be expected. Moreover, talking in the direction of a device does not necessarily indicate intent to communicate with it. Perhaps future enhancements will be able to overcome these issues and we will see direction of voice algorithms in smart devices in our homes.

Nick Bild
R&D, creativity, and building the next big thing you never knew you wanted are my specialties.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles