Human-Like Sensory System Improves Robot Navigation
Duke’s WildFusion gives robots human-like senses, which helps them to better understand and navigate difficult environments.
As far as they have come in recent decades, robots are still lumbering and clumsy beasts compared to humans. Even when equipped with ultra-high-resolution imaging systems and onboard computing systems capable of performing hundreds of trillions of operations per second, robots simply cannot move with anything like our level of agility. A major reason for this difference in abilities is that we do not rely on vision alone to get around. We also pick up on cues from our other senses, like touch and hearing, to adapt the way we move to the environment we find ourselves in.
Consider the task of walking on loose gravel, for example. It is not so much vision that helps us to alter our stride to avoid slipping, but the feeling of our feet sliding with each step. This comes completely natural to us, but it is a rare robot that can not only sense this information, but also use it to adjust how it interacts with the world. A team of engineers at Duke University recognizes just how crucial these additional sources of information are, so they have developed a multimodal system to assist robots in better understanding the world.
Called WildFusion, the team’s approach integrates signals from LiDAR, an RGB camera, contact microphones, tactile sensors, and an IMU into a 3D scene reconstruction algorithm. Using this information, robots can understand the objects around them, and are better equipped for planning a path that will get them from one point to another.
Many robotic navigation systems have long relied on visual data from cameras or LiDAR systems alone. While effective in controlled or structured environments, these sensors often fall short when faced with the unpredictability of real-world, unstructured settings like forests, disaster zones, or remote terrain. Sparse data, fluctuating lighting, moving objects, and uneven surfaces can easily confuse these systems.
WildFusion seeks to change that by mimicking the way humans gather and interpret multisensory data. For instance, the system’s contact microphones detect the acoustic vibrations generated by each step a robot takes. Whether it is the crunch of dry leaves or the squish of mud, these audio cues provide important information about the type and stability of the ground. Tactile sensors measure the force on each robotic foot, revealing how slippery or solid the terrain might be. The inertial measurement unit, on the other hand, helps to gauge how much the robot is wobbling or tilting as it moves.
All of these inputs are then fused into a single, continuous scene representation using a technique based on implicit neural representations. Unlike conventional 3D mapping methods that piece together visual data into point clouds or voxels, WildFusion uses deep learning to model the environment as a seamless surface. This allows the robot to fill in the blanks when visual data is missing or unclear, much like we do.
The system was put to the test in the challenging environment of Eno River State Park in North Carolina. There, a four-legged robot equipped with WildFusion successfully navigated dense forests, grassy fields, and gravel trails. Not only was it able to walk with greater confidence, but it also demonstrated the ability to choose safer and more efficient paths.
Looking ahead, the team plans to expand WildFusion by incorporating even more types of sensors, such as thermal imagers and humidity detectors. With its flexible and modular architecture, the system holds promise for a wide range of applications, from disaster response and remote infrastructure inspections to autonomous exploration of unfamiliar terrain.