A Joint Effort in Object Identification
MIT researchers taught robots to identify objects by feel alone — just a shake, a squeeze, and smart use of their joint encoders.
The average child has no trouble picking up a wrapped package from underneath the Christmas tree, giving it a little shake and squeeze, then confidently declaring what is contained within. Hints like the telltale rattle of a LEGO set, or the squishiness of a pair of socks (thanks, Grandma!), give it away every time. Robots, on the other hand, have a lot more difficulty identifying unknown objects in this way. Traditionally, they need a camera or other sensors to collect information before making a guess.
There is a problem with this approach, however. Loading a robot down with sensors greatly adds to its cost, and this also necessitates the inclusion of additional onboard processing hardware, which — you guessed it — further increases costs. Furthermore, some sensing equipment, like cameras, fail under low-light conditions, rendering them useless for certain applications. To meet the need for more economical and versatile object identification, researchers at MIT have proposed a new approach. Rather than relying on expensive hardware, their approach repurposes components that most robots already have.
A key part of this process involves the use of joint encoders. These sensors are embedded in most robots' joints and measure rotational position and speed during movement. As the robot interacts with an object, the encoders record subtle variations in joint movement. For example, when lifting a heavy item, the robot’s joints won’t rotate as far or as quickly under the same amount of force compared to lifting something lighter. Similarly, squeezing a soft object will cause more joint flexion than squeezing a rigid one. By collecting this data, the robot builds a detailed picture of how the object responds to its actions.
To make sense of the data, the researchers use a technique called differentiable simulation. This involves creating digital models of both the robot and the object, and simulating their interaction under slightly varied assumptions. The system then compares these predictions to what actually happened, quickly zeroing in on the object’s true properties. The entire process only takes a few seconds and can be run on a standard laptop.
Looking ahead, the team is planning to evaluate the possibility of combining their new technique with traditional computer vision-based methods. They believe that the combination of the two could give robots much more powerful, and robust, sensing capabilities.
Ultimately, this low-cost, data-efficient approach could help robots function in environments where cameras fail or complex sensors are impractical. Whether working in disaster recovery, low-light warehouses, or cluttered households, robots that learn by touch may finally be able to keep up with the cleverness of a child at Christmastime.