Google Aims at Intelligent Robotics, Heralds a Milestone Towards "AGI" with Gemini Robotics 1.5
The company's new vision-language-action model is only available to "select partners," but Gemini Robotics-ER 1.5 is accessible to all now.
Google has announced two new vision-language models, Gemini Robotics 1.5 and Gemini Robotics-ER 1.5, which it claims mark a step towards the creation of "intelligent, truly-general purpose robots" β though, at the time of writing, only the latter model was generally available.
"Earlier this year, we made incredible progress bringing Gemini's multimodal understanding into the physical world, starting with the Gemini Robotics family of models," Google's Carolina Parada claims of the initial launch of Gemini Robotics. "Today, we're taking another step towards advancing intelligent, truly general-purpose robots. We're introducing two models that unlock agentic experiences with advanced thinking: Gemini Robotics 1.5 [and] Gemini Robotics-ER 1.5."
The models, of course, do not "think," though alongside "reasoning" the term has become common in the marketing materials underpinning the artificial intelligence boom: Gemini Robotics 1.5 is a vision-language-action model, which like the large language model that is at the heart of Google's Gemini platform and on which these new models are built, turns its input into a token stream and outputs the most statistically-likely continuation tokens as a response β effectively a complex and power-hungry form of autocomplete.
In the case of Gemini Robotics 1.5, that stream of continuation tokens takes the form of an output, which gives the entirely illusionary impression of an entity that, in Parada's words, "thinks before taking action and shows its process" β turning visual information and natural-language instructions into control commands to a robot's motors.
Gemini Robotics-ER 1.5, meanwhile, is a vision-language model that, again in Parada's words, "reasons about the physical world" β though, again, this is merely an illusion, as no actual reasoning ever takes place. The model, Parada claims, "creates detailed, multi-step plans to complete a mission" β and, in a nod to the current trend towards "agentic AI" is able to call external digital tools in order to finish a given task.
"Gemini Robotics 1.5 marks an important milestone towards solving AGI [Artificial General Intelligence] in the physical world," Parada continues, a bold and entirely unproven claim. "By introducing agentic capabilities, we're moving beyond models that react to commands and creating systems that can truly reason, plan, actively use tools and generalize. This is a foundational step toward building robots that can navigate the complexities of the physical world with intelligence and dexterity, and ultimately, become more helpful and integrated into our lives."
At the time of writing, Google had only publicly released Gemini Robotics-ER 1.5, which is available through the Gemini API to Google AI Studio members today; Gemini Robotics 1.5, meanwhile, is only being made available to "select partners." More information is available on the Google for Developers blog.