The Sound of Deception

This exploit can fool the countermeasures deployed by voice authentication systems to detect artificially generated speech samples.

2 years ago • Security

Voice authentication systems, also known as voice biometrics or speaker recognition systems, are emerging technologies designed to identify and verify individuals based on their unique vocal characteristics. These systems utilize the distinct attributes of an individual's voice, such as pitch, tone, cadence, and pronunciation, to establish their identity. Voice authentication has gained prominence as a reliable method of user verification due to its convenience, non-intrusiveness, and inherent difficulty to replicate or forge.

Adoption of these systems has been rapidly increasing in the financial industry in particular. Banks and other financial institutions now often employ voice biometrics to authenticate customers during telephone banking transactions. By using voiceprints, these systems can verify the caller's identity and ensure secure access to account information and funds. This application helps mitigate fraud risks associated with traditional security measures like knowledge-based authentication questions.

Whether it is in banking, or another security-critical application, voice authentication systems work by initially asking an individual to repeat a phrase several times. From this sample, the individual’s unique vocal signature is extracted and stored. When that person subsequently needs to authenticate to the system, they are asked to repeat a new phrase that they have not previously provided the system with. Vocal features are extracted from this new phrase and are then compared with the vocal signature on file to determine if they match.

Speaker recognition can be achieved with a high degree of accuracy using these systems, however, voice spoofing techniques have raised questions about how much confidence we should place in them. Recent advances in generative artificial intelligence, for example, make it possible to replicate an individual’s voice given only a few minutes of recorded samples. Accordingly, in the cat-and-mouse game of security, countermeasures have been built into voice authentication systems to detect artificially synthesized speech.

These countermeasures are no checkmate, however, says a pair of security researchers at the University of Waterloo in Canada. They have developed a system that can analyze spoofed speech to find the characteristics that reveal it to be inauthentic, then it removes those features. These techniques do not require any knowledge of the target system, and work universally on all countermeasure detection systems — at least for now.

The team notes that present authentication systems learn to distinguish between real and synthetic speech by learning cues that are both easy to identify and forge. By uncovering the nature of these audio cues, they were able to devise a method to remove them, while still preserving the original textual contents of the speech sample.

With only six attempts, it was observed that this attack could successfully fool many common voice authentication systems in 99% of cases. Even enterprise-grade systems like Amazon’s Connect were deceived 40% of the time after just 30 seconds worth of attempts. These results seriously call into question the amount of trust we should place in today’s voice authentication systems. The countermeasures presently deployed appear to be a common Achilles’ heel impacting nearly all systems.

The researchers did not do this work because they want to compromise anyone’s bank account. Rather, they hope that by demonstrating this flaw, and the insecurity it can lead to, that companies will act to better secure their voice-based authentication systems. Perhaps the information revealed will also assist in the development of stronger countermeasures in the future.

R&D, creativity, and building the next big thing you never knew you wanted are my specialties.

The Sound of Deception

This exploit can fool the countermeasures deployed by voice authentication systems to detect artificially generated speech samples.

Latest articles

Sponsored articles

Related articles

Latest articles

Related articles