MelGAN Adversarial Network Strips Identifying Features From Smart Assistant Audio Recordings

Given audio recordings as input, MelGAN can strip out identifying features without losing the informational value or generate new features.

Researchers from the Chalmers University of Technology and the RISE Research Institutes of Sweden have released a system dubbed MelGAN, which they say can strip out identifying information from smart assistant audio recording data sets — right down to the gender of the person speaking.

"As more and more data is collected in various settings across organizations, companies, and countries, there has been an increase in the demand of user privacy," the team, led by David Ericsson, explains. "Developing privacy preserving methods for data analytics is thus an important area of research. In this work we present a model based on generative adversarial networks (GANs) that learns to obfuscate specific sensitive attributes in speech data."

"We train a model that learns to hide sensitive information in the data, while preserving the meaning in the utterance. The model is trained in two steps: first to filter sensitive information in the spectrogram domain, and then to generate new and private information independent of the filtered one. The model is based on a U-Net CNN that takes mel-spectrograms as input. A MelGAN is used to invert the spectrograms back to raw audio waveforms. We show that it is possible to hide sensitive information such as gender by generating new data, trained adversarially to maintain utility and realism."

The MelGAN system takes original audio, pictured top-right, and generates new privacy-preserving variants. (📷: Ericsson et al)

When processed through the MelGAN system, the audio loses identifying characteristics that would enable a listener to identify a particular speaker — but, interestingly, it retains key features such as intonation which a voice assistant may use when analyzing how to make its response. Better still, it allows for the insertion of 'false' features — providing a data set containing just recordings of female voices to form the basis of a generated male-voice data set, for example.

The team's work, brought to our attention by VentureBeat, has been published under open-access terms on DeepAI.org; the Python source code for MelGAN, meanwhile, can be found on GitHub under the permissive MIT License.

Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.

MelGAN Adversarial Network Strips Identifying Features From Smart Assistant Audio Recordings

Given audio recordings as input, MelGAN can strip out identifying features without losing the informational value or generate new features.

Latest articles

Sponsored articles

Related articles

Latest articles

Related articles