These Robots Will Get You Into Tight Spaces
MIT researchers showed that diffusion models can be used to teach robots to pack a variety of objects into some very tight spaces.
Some people can simply look at a pile of disparate objects and see exactly how to fit them all together into a tight space, like a three-dimensional jigsaw puzzle. Others, on the other hand, struggle to fit even the simplest of items into a box, and their car trunks or the backs of their trucks often end up looking like a disaster zone. Spatial visualization oftentimes seems like it is a natural ability that some of us are just born with. And those that are not may be forever doomed to struggle with turning chaos into order.
Given that this skill can be so difficult for us to learn, you might imagine that robots have a much harder time yet. After all, they have no natural abilities. We must teach them everything that they know, and we are not especially good at teaching good spatial visualization skills to others. This will only become more of a problem in the years to come as autonomous robots take on more and more tasks in warehouses and packing and shipping operations.
To solve this problem, many constraints must be considered. These range from the type of grasp to make to where objects should be placed and the avoidance of collisions. At present, robotic manipulation planning systems generally consider these constraints one at a time. After each model proposes the next step, the result is checked against all of the other models for constraint violations. This sort of sequential process is very inefficient, and the amount of time it takes to arrive at a solution is often impractical for real-world applications.
Leveraging some recent advancements in artificial intelligence, a group of researchers at MIT and Stanford University arrived at a different solution to the problem of endowing robots with spatial visualization capabilities. The team believed that a unified model could more efficiently attack this problem, but realized that the overhead associated with training a monolithic model would be too great. Instead, they trained a separate machine learning model for each constraint, but designed their approach such that all constraints could still be taken into account at one time.
The new technique, called Diffusion-CCSP, leverages a diffusion model to represent each constraint. These diffusion models are of the same type as those that have gained popularity for their ability to turn text prompts into creative images (e.g. Stable Diffusion). The process begins with a randomly generated solution to the problem. Each of the models contributes to a refined version of that initial solution, taking into consideration its own constraints. This iterative process continues until an acceptable solution has been found.
Using this system, the researchers discovered that better solutions could be found than those produced by existing methods. Moreover, the solutions were generated much more quickly, demonstrating that Diffusion-CCSP is more practical for real-world use cases.
Avoiding a single, monolithic model did reduce the amount of training data that was required, however, a large number of examples were still needed. Manually generating sufficient examples to train each model would still have been prohibitively time-consuming, so the team instead turned to computer simulations for assistance. They developed algorithms that allowed them to very rapidly generate simulated data consisting of a wide variety of objects that were tightly packed together. This information provided the models with an idea of what good solutions look like, such that they would have a goal to work towards.
The Diffusion-CCSP system was deployed on a physical robot to prove that the methods could be translated into results in the real world. At present, the researchers are working to make Diffusion-CCSP capable of working in more complex scenarios in preparation for future deployments.