With recent advancements in Vision-Language-Action (VLA) models and Vision-Language Models (VLMs), embodied AI is transforming how robots understand and interact with home environments. These models enable robots to semantically understand user intent, perform continuous planning, and adapt to real-world tasks through natural demonstrations.
In this project, we combine a LeRobot SO-100 ARM with a quadruped robot platform to create an autonomous home cleaning companion. Inspired by Vector Wang (creator of XLeRobot) who demonstrated remote teleoperation over HTTP, we've built a system that lets anyone train a robot dog from anywhere in the world. The first Unitree and LeRobot co-active model, where we plan to open-source the affordable LeRobot arm with the Unitree for research and personal interest.
The Concept (Train That Dog)Our robot dog uses its LeRobot arm as an intelligent "tail" that detects objects on the floor, commands the dog to sit, and picks them up—from toys to trash to pet waste. The key innovation is making robot training accessible to everyone through remote teleoperation.
The project involves:
- Mobile manipulation platform - Integrating a LeRobot SO-100 ARM onto a quadruped robot for dynamic object retrieval
- Computer vision detection - Using VLMs to identify and localize objects requiring pickup
- Remote teleoperation interface - Building a web application at trainthat.dog where users can control and train the robot from their office, phone, or anywhere with internet
- VLA model training - Using ACT transformer model (April 2023) the performs end-to-edn imitation learning directly from real demonstrations, collected with a custom teleoperation interface. This model is suited for high-precision domains and require a relatively smaller number of steps. Thedemonstration data remotely and fine-tuning vision-language-action models to understand user intent and execute collaborative pick-and-place tasks
- Synthetic data generation - Augmenting training datasets to improve generalization across different home environments and object types
- Continuous planning - Enabling the robot to autonomously navigate, detect, approach, and manipulate objects through real-time decision-making
Hardware:
Assemble the SO100 Arm.3D print the camera mount and attach it securely to the arm.
Software:
Install LeRobot.
Install Hugging Face on the Jetson Nano.
2. CalibrationRun lerobot-calibration with the arm positioned at the midpoint.
Adjust each motor to its maximum rotation range for proper calibration.
3. TeleoperationEstablish a connection between the leader and follower devices using lerobot-teleoperation.
Use lerobot-recording to record datasets.
Dataset reference: Hugging Face Dataset – seeed_poopascoopa_v2
5. Isaac SimulationA simulation was run for record simulation datasets.
The ACT model was trained on an RTX 4090 GPU for 3 hours (10, 000 steps).Trained model available at: seeed_poopascoopa_v4
Deploy the trained model on the Jetson Nano.Enable real-time video streaming for live inference and monitoring.
Visit trainthat.dog to remotely train your robot dog NOW! Control the arm, teach new behaviors, and help us build the ultimate home dog cleaning companion.












Comments