"Digital Ventriloquism" Brings Interactivity to Otherwise Silent Objects Through Clever Projection
Built using a Raspberry Pi and Arduino, the proof-of-concept allows a smart speaker system to throw its voice to dumb household objects.
A trio of researchers at Carnegie Mellon University (CMU) and the University of Michigan have published details of what they describe as "digital ventriloquism:" the ability to project responses from smart speakers onto any object in the room.
"Smart speakers with voice agents are becoming increasingly common. However, the agent's voice always emanates from the device, even when that information is contextually and spatially relevant elsewhere," the team explains of the central problem to be addressed. "Digital Ventriloquism allows smart speakers to render sound onto everyday objects, such that it appears they are speaking and are interactive. This can be achieved without any modification of objects or the environment."
"For this, we used a highly directional pan-tilt ultrasonic array. By modulating a 40 kHz ultrasonic signal, we can emit sound that is inaudible 'in flight' and demodulates to audible frequencies when impacting a surface through acoustic parametric interaction. This makes it appear as though the sound originates from an object and not the speaker."
The team's work is not the first time ultrasonic parametric speakers have been teamed up with voice-activated assistant systems, though for arguably less noble goals: Last last year researchers published an attack against voice-activated assistants which used beams of focused ultrasonic audio from a parametric loudspeaker to send commands which could only be heard at a precise crossover point.
For digital ventriloquism to work, however, the system β which pairs a parametric ultrasonic speaker array with a pair of servo motors controlled by an Arduino Uno board β must know where objects to be targeted are located. The team tried two approaches: In the first, a Raspberry Pi connected to ReSpeaker XYZ microphone arrays detects the operator clapping or clicking their fingers next to each desired target object and calculations made to determine its direction from the system; in the second, a camera was added to allow the YOLOv3 object detection system to locate objects automatically.
The system has proven promising in testing: "We ran a study in which we projected speech onto five objects in three environments," the team writes, "and found that participants were able to correctly identify the source object 92 percent of the time and correctly repeat the spoken message 100 percent of the time, demonstrating our digital ventriloquy is both directional and intelligible.
"While parametric audio does not aim to replace traditional speakers for music and entertainment, a ventriloquism approach has unique benefits with respect to embodiment and immersion that traditional speakers cannot offer. We hope this paper spurs future work in Digital Ventriloquism and uses of acoustic parametric interaction in HCI."
The team's work has been published under open access terms on the ACM Digital Library as part of the proceedings of the CHI Conference on Human Factors in Computing Systems (CHI'20).