AI That Will Push Your Buttons
A new machine learning algorithm teaches autonomous robots how to interact with the real-world in meaningful ways.
In recent years, advancements in machine learning have produced algorithms that are, in some cases, fantastically good at recognizing objects in images. This is a step forward in the direction of building autonomous robots that are able to interact with their environment. However, there still exists a gulf between recognition of an object and understanding how to meaningfully interact with that object.
A collaboration between researchers at Stanford University and Facebook has brought forth a new technique that may begin to fill this gulf. Called Where2Act, the new method makes use of image and depth data, and from it predicts actions that can be performed on objects found in the image.
The team trained a neural network that, when supplied with an image, performs a per-pixel prediction that includes where to interact, how to interact, and the outcome of the interaction. For example, given an image of a desk drawer, the model may predict that a parallel pulling action on the handle will open the drawer. Possible actions are assigned “actionability” scores by which they are ranked, such that the actions most likely to be successful can be considered to act upon.
An interactive environment was created using the 3D environment simulator SAPIEN, to both collect data for training and test the resulting model. The simulation included six types of interactions (pushing, pushing-up, pushing-left, pulling, pulling-up, pulling-left) with 972 shapes, representing 15 commonly seen indoor object categories.
Evaluation of the model showed promise for the Where2Act method— pulling actions tended to be strongly associated with high-curvature regions (e.g. part boundaries, handles), and pushing actions tended to be associated with flat surfaces. These findings extended to novel, unseen object categories and real-world data, indicating that the model has been generalized beyond what it observed in the training data.
There is still work to be done before Where2Act will have an impact outside the world of academic research — the set of available interactions is currently very limited, and only a single image frame can be used as input — both are factors that reduce the number of use cases for the technique. With future advancements, I look forward to a day when in-home robots powered by Where2Act can finally put an end to microwaves flashing twelve o’ clock all around the world.
R&D, creativity, and building the next big thing you never knew you wanted are my specialties.