VDI Enables Intuitive Robot Instruction
MIT's VDI lets anyone teach robots new tricks via live demos, teleoperation, or direct guidance. No AI or coding experience is needed.
Neural networks, backpropagation, activation functions — for those unfamiliar with how artificial intelligence (AI) algorithms work, it can be a very confusing topic. But now more than ever, it seems clear that AI is here to stay, and that we will all need to learn to work with this technology. Practically speaking, how might that work? It is unreasonable to expect that everyone will become an expert in mathematics, computer science, data science, and everything else necessary to be proficient in the field.
Robotics is one area in particular that has been identified where more AI know-how will be needed in the near future. In warehouses, industrial settings, and beyond, workers will need to teach these machines to do their jobs and do them well. To sidestep many of the complexities associated with training an AI algorithm, frameworks have been developed to help robots learn new tasks via demonstrations. These demonstrations typically are done by letting the robot watch a human perform the task, teleoperation, or physically moving the robot manually.
In dynamic environments, where a robot needs to frequently take on new tasks, all of these demonstration methods may need to be employed for adequate performance. However, existing frameworks generally only support a single method. To address this limitation of existing systems, a group of engineers at MIT CSAIL has developed a new tool called the Versatile Demonstration Interface (VDI). It is designed to make it easy for anyone to train a robot by allowing any of the three most popular demonstration methods to be used as needed.
VDI is a sensor-equipped, handheld attachment that mounts to a collaborative robotic arm via the DIN ISO 9409-1-A50 standard interface. Its modular design supports remote teleoperation, kinesthetic guidance (physically guiding the robot), and natural demonstrations in which the robot watches a user perform a task using a detached tool.
The system includes a camera for visual tracking, a set of AprilTags for pose estimation, and a uni-axial force sensor (DYMH-103 with an HX711 amplifier) to measure applied pressure. This setup captures the spatial and force data necessary for learning. In teleoperation mode, a 6-degree-of-freedom SpaceMouse is used to control the robot remotely. Though it lacks haptic feedback, the system communicates force limits through LEDs and audio alerts.
For kinesthetic teaching, the VDI uses the UR5e robotic arm’s gravity compensation mode, allowing users to physically guide the robot. The team also built in tool-side force sensing, which detects discrepancies between the robot’s internal estimates and external measurements for enhanced accuracy.
In natural demonstration mode, the interface detaches from the robot via a pin mechanism. Once removed, a camera mounted on the robot automatically tracks the tool using its AprilTag markers, with pose estimation being handled by an Extended Kalman Filter. A nonlinear optimization algorithm then adjusts the camera's position using four objectives: maintaining optimal tool-camera distance, centering the tool in view, encouraging a neutral camera pose, and constraining camera angles.
To evaluate VDI’s real-world effectiveness, the team conducted a user study at a local manufacturing innovation center. Manufacturing experts used all three teaching modes to train a robot on two tasks: press-fitting pegs and molding a pliable material around a rod. The participants in the study reported that VDI increased training flexibility and that it would expand the types of users that could interact with robots. If these results pan out in the real world, VDI could bring us closer to a future where teaching robots is as intuitive as teaching another person.
R&D, creativity, and building the next big thing you never knew you wanted are my specialties.