PARP Pruning Approach Boosts Performance, Reduces Error Rate of Automatic Speech Recognition Models

Designed to reduce computationally-complex fine-tuning, this double-prune method can boost both performance and accuracy.

A team of researchers at the Massachusetts Institute of Technology (MIT), UC Santa Barbara, and the National Taiwan University have come up with a new way to reduce the size of speech recognition networks and improve their performance, without a loss in accuracy: Prune, Adjust, and Re-Prune, or PARP.

"[PARP] discovers and fine-tunes subnetworks for much better performance, while only requiring a single downstream ASR [Automatic Speech Recognition] fine-tuning run," the researchers explain of their work, which was brought to our attention by IEEE Spectrum. "PARP is inspired by our surprising observation that sub-networks pruned for pre-training tasks need merely a slight adjustment to achieve a sizeable performance boost in downstream ASR tasks."

The idea behind PARP is to take a pre-trained speech recognition model, run a pruning pass which simply sets weak links' strengths to zero rather than removing them from the model altogether, then runs a fine-tuning pass based on labeled data — cutting out the double-tuning approach used by other pruning methods like One-shot Magnitude Pruning (OMP).

The results are impressive: "On the 10min Librispeech split without LM decoding, PARP discovers sub-networks from wav2vec 2.0 with an absolute 10.9%/12.6% WER [Word Error Rate] decrease compared to the full model," the researchers write. "We further demonstrate the effectiveness of PARP via: cross-lingual pruning without any phone recognition degradation, the discovery of a multi-lingual sub-network for 10 spoken languages in one fine-tuning run, and its applicability to pre-trained BERT/XLNet for natural language tasks."

In other words: PARP requires less computational effort than OMP, and in some cases outputs a network which is not only smaller but demonstrably less error-prone than its un-pruned equivalents.

The team's work is to be presented at the Neural Information Processing Systems (NeurIPS) conference this month, and is available under open access terms on OpenReview.net. A source code repository linked to the paper has not yet been populated, however..

machine learning

artificial intelligence

speech recognition

voice recognition

Gareth Halfacree

Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.

PARP Pruning Approach Boosts Performance, Reduces Error Rate of Automatic Speech Recognition Models

Designed to reduce computationally-complex fine-tuning, this double-prune method can boost both performance and accuracy.

Latest articles

Sponsored articles

Related articles

Latest articles

Related articles