Physical Intelligence's Vision-Language-Action Model, π₀.₅, Delivers a More Generalized Robot Brain
A new approach to training data gives this VLA the ability to direct a robot through complex tasks in entirely unfamiliar environments.
Physical Intelligence, a startup aiming to bring general-purpose artificial intelligence into the physical world, has announced a new model that it claims can generalize assistive robots — allowing, for example, a household robot to work in any house, whether it's been trained on its layout or not.
"The biggest challenge in robotics is not in performing feats of agility or dexterity, but generalization: the ability to figure out how to correctly perform even a simple task in a new setting or with new objects," the company explains of its work. "Imagine a robot that needs to clean your home: every home is different, with different objects in different places. This is why most commercial robots operate in tightly controlled environments like factories or warehouses: in a world where the robot never needs to venture outside of a single building and where the objects and their locations are predetermined, current robotic methods that provide for only weak generalization can be very successful."
What works in the rigid environment of an automated warehouse, though, isn't going to work in the wider world — and it certainly won't deliver the kind of pick-up-and-play future of commercial robotics, where a user can buy a robot and have it working in their home on the same day. For that, a new approach is required; Physical Intelligence says that the latest version of its vision-language-action (VLA) model, π₀.₅, is a step along the path to exactly that.
"In our experiments," the company says, "π₀.₅ can perform a variety of tasks in entirely new homes. It does not always succeed on the first try, but it often exhibits a hint of the flexibility and resourcefulness with which a person might approach a new challenge. The individual tasks that π₀.₅ performs vary in difficulty, from rearranging objects (e.g., to put dishes in the sink) to much more intricate behaviors, such as using a sponge to wipe down a spill."
The trick to the model's success: co-training on heterogeneous data from a variety of different sources. The result is a model that appears more generalized than its competitors, though at the cost to precision and dexterity. "There is a lot left to do," the company admits. "While our robots can improve from verbal feedback, they could also in the future utilize their autonomous experience to get better with even less supervision, or they could explicitly request help or advice in unfamiliar situations. There is also a lot left to do to improve transfer of knowledge, both in the technical aspects of how the models are structured, and in the diversity of data sources that our models can employ."
More information is available on the Physical Intelligence website, while a preprint on the company's research is available on Cornell's arXiv server under open-access terms. The company has also published its earlier π₀ and π₀-FAST models on GitHub under the permissive Apache 2 license, but at the time of writing π₀.₅ was not publicly available.