In the world of hobbyist robotics, we’ve moved past simple line-following and obstacle avoidance. The new frontier is Embodied AI—systems that don't just "run code, " but actually perceive, reason, and act within a dynamic environment. LanderPi is a composite robot designed to showcase this "triple threat" integration: SLAM navigation, Multimodal Large Language Models (LLMs), and 3D computer vision.
The Stack: Hardware & Software SynergyTo bridge the gap between digital "thinking" and physical "doing, " LanderPi utilizes a robust tech stack:
- Brain: Raspberry Pi 5 acting as the primary host.
- Senses: High-performance TOF LiDAR and a 3D Depth Camera.
- Action: A 6-DOF robotic arm driven by high-torque encoder motors.
- Middleware: ROS 2 (Humble/Foxy) for orchestration.
- Intelligence: YOLOv11 for real-time detection, MoveIt for motion planning, and integrated APIs for LLMs like DeepSeek or Qwen.
Build, Code, Explore: Follow our step-by-step LanderPi TutorialsThe "Grand Challenge": The Smart Community Runner
To see how these layers work together, imagine a "Smart Community" scenario. You give LanderPi a complex, natural language command:
"Hey Hiwonder, pick up that wooden block 'trash' and take it to the recycling bin. Then, head to the market to see what fruits are in stock, check the garden for the dog, and finally, grab my red package from the station and bring it home."
In traditional robotics, this would require a massive "if-then" script. With LanderPi’s integrated stack, the execution is far more elegant.
1. Semantic Intent Parsing (The LLM Layer)
When the voice command is received, the LLM doesn't look for keywords; it performs semantic parsing. It identifies the sequence of tasks (pick, place, inspect, fetch), the target objects (trash, fruit, dog, package), and the geographic locations (market, garden, station). The LLM acts as the high-level mission planner, breaking the "blurred intent" into a logical task tree.
2. Autonomous Navigation (The SLAM Layer)
Once the plan is set, the robot engages its "Internal GPS." Using the TOF LiDAR, LanderPi either localizes itself on a pre-built map or performs real-time SLAM. It fuses A global planning* with TEB local planners, allowing it to navigate from the trash zone to the market while dynamically swerving around pedestrians or delivery scooters.
3. Precision Interaction (3D Vision & MoveIt)
When it reaches the "trash" or "package, " the 3D depth camera takes over. By processing point clouds and running YOLOv11, the robot identifies the object's precise 3D coordinates. The MoveIt motion planning framework then calculates the optimal trajectory for the 6-DOF arm, adjusting the gripper's posture in real-time to ensure a secure grasp.
4. Cognitive Scene Understanding (The VLM Layer)
For tasks like "checking for the dog" or "identifying fruit, " the robot isn't just looking for a match; it’s understanding the scene. The Vision-Language Model (VLM) analyzes the live feed to provide descriptive feedback: "I see apples and bananas at the market, " or "The dog is currently not in the garden." This transforms the robot from a tool into an intelligent observer.
Conclusion: The Future of Embodied AI
The power of LanderPi lies in its ability to unify low-level motor control, mid-level perception, and high-level cognitive reasoning into a single, organic system. It represents the shift from robots that follow "pre-set paths" to agents that understand "natural language missions."
Whether you are a researcher in embodied AI or a developer looking to push the limits of ROS 2, the LanderPi offers a transparent, open-source platform to explore the next generation of robotics.





Comments