This AR Interface Is Kind of Shady
CMU’s EclipseTouch lets AR headsets turn any surface into a touch interface by tracking infrared shadows, avoiding awkward midair gestures.
As an emerging technology, there is not yet a clear consensus regarding how augmented reality glasses should be implemented. Among the many questions remaining is how their user interfaces should work. The predominant direction device developers are taking at the moment involves using the hands to interact with virtual objects floating in the air. But while this approach may be the most natural way to interact with augmented reality, it is not always the best way.
Grabbing a 3D model and rotating it for a better view makes perfect sense, but what about typing an email? Using a virtual keyboard floating in the air would quickly lead to fatigue, and it would be a terrible experience with no force feedback. So for use cases like these, a group of researchers at Carnegie Mellon University has created a new type of interface called EclipseTouch. Their system tracks the shadows cast by the hands to detect touch events on any ordinary surface.
The team’s prototype is built on a Meta Quest 3, with a small global shutter USB camera mounted to the headset. This camera, fitted with an infrared filter, operates at high frame rates and syncs with infrared LEDs to capture shadow information. The illuminators, arranged at the corners of the headset, emit at a wavelength of 850 nanometers, which is safe, invisible to humans, and ideal for producing crisp shadows on nearby surfaces. A Teensy 3.2 microcontroller and custom LED driver board manage the timing of the illuminators and the camera with microsecond precision, ensuring that clean, structured shadow data is produced.
Of course, natural light sources like the sun and incandescent bulbs also emit infrared and can interfere with the shadows EclipseTouch depends on. To solve this, the system subtracts out any extraneous shadows by capturing frames with the LEDs turned off, then removing those contributions from the illuminated frames. This technique isolates the shadows that matter.
On the software side, a high-speed video pipeline combines frames from the different LEDs into a single composite image stream. From there, the system detects and tracks the fingertips, creating normalized image patches that are fed into a machine learning model. This hybrid vision transformer processes the data to estimate both touch state and hover distance. The system requires less than half a millisecond per inference on an Apple M2 processor, the same chip used in Apple’s Vision Pro headset.
EclipseTouch has been shown to work across a wide range of conditions. Whether the environment is pitch black, flooded with sunlight, or the surface is wood, metal, or skin, the system remains reliable. And because the required hardware (an infrared camera, LEDs, and a modest amount of compute power) is already standard in many headsets, EclipseTouch could be adopted without adding bulk or external devices.
Instead of being limited to midair gestures, EclipseTouch users could tap on any surface around them, instantly transforming it into a responsive interface. If adopted widely, that might mean that the table in front of you could be your next keyboard, no matter where you are.