Neural Acoustic Fields Let You Map How the World Sounds — and Simulate Listening From Anywhere

Based on techniques used in visual processing, NAFs can simulate how sound is perceived from any location — or build maps of areas.

Researchers at the Massachusetts Institute of Technology and Carnegie Mellon University have developed a mapping system for sound, allowing machine learning models to build an idea of how sound propagates through a space — and simulate what people stood at any given location would hear.

"Most researchers have only focused on modeling vision so far," explains Yilun Du, graduate student at MIT's department of electrical engineering and computer science and co-author of the paper, of the team's work. "But as humans, we have multimodal perception. Not only is vision important, sound is also important. I think this work opens up an exciting research direction on better utilizing sound to model the world."

A Neural Acoustic Field (NAF) lets you simulate what audio sounds like from any location, a potential boon for VR and the metaverse. (📹: Luo et al)

The team's concept of "Neural Acoustic Fields" is based heavily on prior work on the creation of 3D scenes from visual input using an implicit neural representation model. Rather than images, though, the team's variant works with sound — that required a way to avoid the model's reliance on photometric consistency, a phenomenon where an object looks roughly the same regardless of where you're standing which does not apply to sound.

To work around the problem, the team's NAF approach takes into account the reciprocal nature of sound — swapping where the source and the listener are does not change how the sound is perceived — and the influence of local features like furniture or carpeting on the sound as it travels and bounces. Using a grid of objects and architectural features, the model randomly samples points and learns from its experience.

"If you imagine standing near a doorway, what most strongly affects what you hear is the presence of that doorway, not necessarily geometric features far away from you on the other side of the room," explains first author Andrew Luo. "We found this information enables better generalization than a simple fully connected network."

The finalized NAF model is fed both visual information and spectrograms of what a given audio sample sounds like at selected points in the area — giving it the ability to then predict what happens to the sound when the listener moves. Compared to prior approaches to modeling acoustic information, the team claims its approach offers improved accuracy across the board — and can even be used backwards to improve a visual map of an area, or even create one from scratch.

"When you only have a sparse set of views," Du explains, "using these acoustic features enables you to capture boundaries more sharply, for instance. And maybe this is because to accurately render the acoustics of a scene, you have to capture the underlying 3D geometry of that scene."

A preprint of the team's work is available on Cornell's arXiv server under open-access terms; additional information can be found on the project website, with the code published to GitHub under an unspecified open-source license.

Gareth Halfacree
Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles