This project aims to develop an autonomous legged robot capable of detecting, tracking, and pursuing a ball in real-time using computer vision and dynamic locomotion. It demonstrates advanced integration of perception, control, and mobility in robotics, serving as an educational platform for learning legged robot dynamics while advancing skills in autonomous behaviors suitable for applications like search-and-rescue or robotic entertainment. ROS 2 is employed for its robust middleware features, including real-time performance and distributed communication, enabling modular development of nodes for vision processing, commanding the base controller for gait generation, and state estimation. Edge Impulse is used for training a lightweight machine learning model to efficiently detect and classify the ball on resource-constrained hardware, enhancing the robot's real-time vision capabilities.
HardwareWe are using the Hiwonder MechDog Robot equipped with 8 high-speed coreless servos driven by an ESP32-based controller.
For image capture, we are using the Hiwonder Vision Module, which features a 2-megapixel camera and is powered by the ESP32-S3 processor.
For model inference and running ROS 2 nodes, we utilize the Arduino UNO Q, powered by its Qualcomm Dragonwing QRB2210 microprocessor. This quad-core processor runs a Debian-based Linux environment, providing the necessary computational resources for efficient on-device machine learning inference, real-time ROS 2 communication, and complex processing tasks. Additionally, the entire system operates on battery power, enabling untethered mobility and extended operation for the autonomous quadruped robot.
The Arduino Uno Q is mounted on top of the robot and powered via the 5V pin on its header connectors.
We will operate the Arduino Uno Q in headless mode. Refer to the official Arduino documentation for instructions on initial setup and configuring SSH access.
https://docs.arduino.cc/tutorials/uno-q/ssh
ROS 2 InstallationROS 2 lists Debian Linux as a Tier 3 platform. This status means no official binary packages exist from the ROS build farm, so installation requires building from source. Proceed with the steps below to build ROS 2 from source.
Set Locale
$ sudo apt update && sudo apt install locales
$ sudo locale-gen en_US en_US.UTF-8
$ sudo update-locale LC_ALL=en_US.UTF-8 LANG=en_US.UTF-8
$ export LANG=en_US.UTF-8Enable required repositories
As of this writing, no official ROS 2 apt repository supports Debian Trixie. A pull request on GitHub proposes adding support, but it remains unmerged to the main branch, meaning no pre-built Debian packages are available for download. First, attempt to add and install from the standard ROS 2 repositories using the provided commands. If that fails (as expected on Trixie), clone the repository (https://github.com/ros-infrastructure/ros-apt-source/tree/claraberendsen/add-trixie) and manually build the package on your Linux machine, following the included instructions.
$ sudo apt update && sudo apt install curl -y
$ export ROS_APT_SOURCE_VERSION=$(curl -s https://api.github.com/repos/ros-infrastructure/ros-apt-source/releases/latest | grep -F "tag_name" | awk -F\" '{print $4}')
$ curl -L -o /tmp/ros2-apt-source.deb "https://github.com/ros-infrastructure/ros-apt-source/releases/download/${ROS_APT_SOURCE_VERSION}/ros2-apt-source_${ROS_APT_SOURCE_VERSION}.$(. /etc/os-release && echo ${UBUNTU_CODENAME:-${VERSION_CODENAME}})_all.deb"
$ sudo dpkg -i /tmp/ros2-apt-source.debInstall development tools
$ sudo apt update && sudo apt install -y \
python3-flake8-blind-except \
python3-flake8-class-newline \
python3-flake8-deprecated \
python3-mypy \
python3-pip \
python3-pytest \
python3-pytest-cov \
python3-pytest-mock \
python3-pytest-repeat \
python3-pytest-rerunfailures \
python3-pytest-runner \
python3-pytest-timeout \
ros-dev-toolsGet ROS 2 code
Installing Qt-based GUI tools (such as RViz2) will fail on the Arduino UNO Q due to some conflicts with the GPU libraries. Since we have no intention of using any GUI tools on this platform, these packages can be safely skipped.
$ mkdir -p ~/ros2_jazzy/src
$ cd ~/ros2_jazzy
$ wget https://raw.githubusercontent.com/ros2/ros2/jazzy/ros2.reposEdit the downloaded ros2.repos file to remove the entries for any ros-visualization and RViz-related repositories. Then, run the following command.
$ vcs import --input ros2.repos srcInstall dependencies using rosdep
$ sudo apt upgrade
$ sudo rosdep init
$ rosdep update
$ rosdep install --from-paths src --ignore-src -y --skip-keys "fastcdr rti-connext-dds-6.0.1 urdfdom_headers"Build the code in the workspace
$ cd ~/ros2_jazzy/
$ colcon build --symlink-installI have attempted the steps above to install ROS 2, but it might require additional adjustments. If compilation fails due to out-of-memory errors, consider increasing the swap file size to provide more virtual memory. Additionally, build the workspace using a single-threaded or limited-parallel approach to reduce memory usage. Some packages may fail to build due to unresolved dependencies. Carefully review the error messages in the build log to identify the missing libraries, then install them via apt install if available in the Debian repositories, or build them from source if necessary. If certain dependencies cannot be resolved, we can skip building non-essential packages to proceed with the installation. Some packages may contain incorrect library paths in their CMakeLists.txt files, causing build failures even when all dependencies are installed. In such cases, manually edit the affected CMakeLists.txt to fix the paths, then rebuild the workspace.
micro-ROS installationThe ESP32-S3 vision module and the ESP32 robot controller run micro-ROS to publish camera images, IMU data, and joint states over Wi-Fi via UDP. The Arduino Uno Q hosts the micro-ROS agent, which bridges these micro-ROS topics and services to the full ROS 2 ecosystem. Follow the instructions in the link below to install the micro-ROS agent on the Arduino Uno Q.
https://micro.ros.org/docs/tutorials/core/first_application_linux/
Data CollectionWe need to sign up an account at the Edge Impulse Studio and create a new project. We captured 182 images of a ball (with and without a hand present) at various distances using a smartphone, then uploaded them to Edge Impulse Studio.The images were labeled using the Labeling queue tab on the Data Acquisition page.
For the model development, we need to create an impulse, a custom pipeline integrating image processing and machine learning models. Navigate to the Impulse Design > Create Impulse page, select Add a processing block, and choose Image to preprocess and normalize image data. On the same page, select Add a learning block and choose Object Detection (Images). We are using a 160x160 image size. Then, click the Save Impulse button.
Next, go to the Impulse > Image page set the Color depth parameter as RGB.
Click on the Save parameters button which redirects to another page where we should click on the Generate Feature button.
To train the model, navigate to the Impulse > ObjectDetection page. The Object detection settings we selected are displayed below.
We opted for the basic spatial and color space augmentations in the Advanced training settings.
We selected the latest YOLO-Pro model, ideal for detecting balls of varying sizes due to different distances from the camera, and capable of running efficiently on the Qualcomm Dragonwing QRB2210 processor. Click on the Save & train button to begin training.
Once training is finished, the training performance is shown as follows. The quantized (int8) model attained a 99.2% precision score on the training data.
To evaluate the model’s performance on the test datasets, navigate to the Model testing page and click the Classify All button. The model shows improved performance on unseen datasets with a 100% accuracy.
As the model will run inference using the Qualcomm processor (CPU) on the Arduino Uno Q, we chose the C++ library (Linux) with Quantized (int8) option on the Deployment page.
Click the Build button to compile and download the library bundle. We will use this library with the ROS 2 C++ node for the ball detection.
Application DevelopmentThe application uses separate ROS 2 nodes that run on different parts of the robot, keeping each task clearly divided. It takes images from the ESP32-S3 camera, processes them to find and track the ball and sends the live video and results over HTTP so we can watch in real-time. Each node does one job (like detecting the ball or controlling the robot's walking), and they all work together smoothly to reach the goal.
ESP32-S3 microROS image publisher
The vision module publishes 160x160 pixels compressed image.
ESP32 microROS Robot Controller
The robot controller uses inverse kinematics to drive the 8 servos. It publishes 6-DOF IMU data and joint states, subscribes to the /cmd_vel topic for velocity commands, and offers a service to perform the ball-grab action.
Arduino Uno Q ROS 2 nodes
The Arduino Uno Q serves as the primary compute unit, hosting the object detection, bounding box annotation, and object follower ROS 2 nodes. It subscribes to the /image/compressed topic published by the ESP32-S3 vision module. Using the YOLO-Pro machine learning model previously trained and deployed via Edge Impulse, the node performs real-time inference to detect the ball in each frame. Once detected, it computes the bounding box, center coordinates, and confidence score, then publishes annotated data to downstream topics for use in tracking and control.
The full ROS 2 node graph, illustrating all topics, publishers, subscribers, and services across the system, is shown below.
Here is the code for the object_detection_node written in C++.
#include <rclcpp/rclcpp.hpp>
#include <sensor_msgs/msg/compressed_image.hpp>
#include <sensor_msgs/msg/image.hpp>
#include <opencv2/opencv.hpp>
#include <cv_bridge/cv_bridge.hpp>
#include <vision_msgs/msg/detection2_d.hpp>
#include <vision_msgs/msg/detection2_d_array.hpp>
#include "edge-impulse-sdk/classifier/ei_run_classifier.h"
class ObjectDetectionNode : public rclcpp::Node
{
public:
ObjectDetectionNode() : Node("object_detection_node")
{
auto qos = rclcpp::QoS(rclcpp::QoSInitialization::from_rmw(rmw_qos_profile_sensor_data));
qos.best_effort();
sub_ = this->create_subscription<sensor_msgs::msg::CompressedImage>(
"/image/compressed", qos,
std::bind(&ObjectDetectionNode::image_callback, this, std::placeholders::_1));
pub_det_ = this->create_publisher<vision_msgs::msg::Detection2DArray>(
"/edge_impulse/detections", 10);
pub_img_ = this->create_publisher<sensor_msgs::msg::Image>(
"/edge_impulse/img_bbox", 10);
RCLCPP_INFO(this->get_logger(), "Object Detection Node started.");
}
~ObjectDetectionNode()
{
}
private:
void image_callback(const sensor_msgs::msg::CompressedImage::SharedPtr msg)
{
try {
auto start_total = std::chrono::high_resolution_clock::now();
// Convert ROS CompressedImage -> OpenCV Mat (RGB8)
//cv::Mat img = cv_bridge::toCvCopy(msg, "rgb8")->image;
cv::Mat cropped = cv_bridge::toCvCopy(msg, "rgb8")->image;
//cv::Mat cropped;
//resize_and_crop(&img, &cropped);
size_t feature_ix = 0;
for (int rx = 0; rx < (int)cropped.rows; rx++) {
for (int cx = 0; cx < (int)cropped.cols; cx++) {
cv::Vec3b pixel = cropped.at<cv::Vec3b>(rx, cx);
uint8_t b = pixel.val[0];
uint8_t g = pixel.val[1];
uint8_t r = pixel.val[2];
features_[feature_ix++] = (r << 16) + (g << 8) + b;
}
}
ei_impulse_result_t result;
signal_t signal;
numpy::signal_from_buffer(features_, EI_CLASSIFIER_INPUT_WIDTH * EI_CLASSIFIER_INPUT_HEIGHT, &signal);
EI_IMPULSE_ERROR res = run_classifier(&signal, &result, false);
if (res != 0) {
RCLCPP_ERROR(this->get_logger(), "Failed to run classifier (%d)", res);
return;
}
RCLCPP_INFO(this->get_logger(), "Predictions (DSP: %d ms., Classification: %d ms., Anomaly: %d ms.): \n",
result.timing.dsp, result.timing.classification, result.timing.anomaly);
auto detections_msg = vision_msgs::msg::Detection2DArray();
detections_msg.header = msg->header;
for (size_t ix = 0; ix < result.bounding_boxes_count; ix++) {
auto bb = result.bounding_boxes[ix];
if (bb.value < 0.60f) {
continue;
}
vision_msgs::msg::Detection2D det2d;
det2d.bbox.center.position.x = bb.x + (bb.width / 2.0f);
det2d.bbox.center.position.y = bb.y + (bb.height / 2.0f);
det2d.bbox.size_x = bb.width;
det2d.bbox.size_y = bb.height;
vision_msgs::msg::ObjectHypothesisWithPose hyp;
hyp.hypothesis.class_id = bb.label;
hyp.hypothesis.score = bb.value;
det2d.results.push_back(hyp);
detections_msg.detections.push_back(det2d);
// scaled to original image
//cv::rectangle(img, cv::Rect(bb.x * 2, bb.y * 2, bb.width * 2, bb.height * 2), cv::Scalar(255, 255, 0), 2);
cv::rectangle(cropped, cv::Rect(bb.x, bb.y, bb.width , bb.height), cv::Scalar(255, 255, 0), 2);
}
//auto img_msg = cv_bridge::CvImage(std_msgs::msg::Header(), "rgb8", img).toImageMsg();
auto img_msg = cv_bridge::CvImage(std_msgs::msg::Header(), "rgb8", cropped).toImageMsg();
img_msg->header = msg->header; // preserve timestamp and frame_id
pub_det_->publish(detections_msg);
pub_img_->publish(*img_msg);
auto end_total = std::chrono::high_resolution_clock::now();
auto duration_ms = std::chrono::duration_cast<std::chrono::milliseconds>(end_total - start_total).count();
RCLCPP_INFO(this->get_logger(), "Total processing + publish time: %ld ms", duration_ms);
}
catch (const cv_bridge::Exception& e) {
RCLCPP_ERROR(this->get_logger(), "cv_bridge exception");
}
catch (const std::exception& e) {
RCLCPP_ERROR(this->get_logger(), "Exception");
}
}
void resize_and_crop(cv::Mat *in_frame, cv::Mat *out_frame) {
float factor_w = static_cast<float>(EI_CLASSIFIER_INPUT_WIDTH) / static_cast<float>(in_frame->cols);
float factor_h = static_cast<float>(EI_CLASSIFIER_INPUT_HEIGHT) / static_cast<float>(in_frame->rows);
float largest_factor = factor_w > factor_h ? factor_w : factor_h;
cv::Size resize_size(ceil(largest_factor * static_cast<float>(in_frame->cols)),
ceil(largest_factor * static_cast<float>(in_frame->rows)));
cv::Mat resized;
cv::resize(*in_frame, resized, resize_size);
int crop_x = resize_size.width > resize_size.height ?
(resize_size.width - resize_size.height) / 2 : 0;
int crop_y = resize_size.height > resize_size.width ?
(resize_size.height - resize_size.width) / 2 : 0;
cv::Rect crop_region(crop_x, crop_y, EI_CLASSIFIER_INPUT_WIDTH, EI_CLASSIFIER_INPUT_HEIGHT);
*out_frame = resized(crop_region);
}
float features_[EI_CLASSIFIER_INPUT_WIDTH * EI_CLASSIFIER_INPUT_HEIGHT];
rclcpp::Publisher<vision_msgs::msg::Detection2DArray>::SharedPtr pub_det_;
rclcpp::Publisher<sensor_msgs::msg::Image>::SharedPtr pub_img_;
rclcpp::Subscription<sensor_msgs::msg::CompressedImage>::SharedPtr sub_;
};
int main(int argc, char** argv)
{
rclcpp::init(argc, argv);
auto node = std::make_shared<ObjectDetectionNode>();
try {
rclcpp::spin(node);
}
catch (const std::exception& e) {
RCLCPP_ERROR(node->get_logger(), "exception");
}
rclcpp::shutdown();
return 0;
}We can view the /edge_impulse/detections topic by running the following command.
$ ros2 topic echo /edge_impulse/detections
---
header:
stamp:
sec: 1767547141
nanosec: 2496106244
frame_id: /camera
detections:
- header:
stamp:
sec: 0
nanosec: 0
frame_id: ''
bbox:
center:
position:
x: 93.0
y: 76.0
theta: 0.0
size_x: 16.0
size_y: 16.0
id: ''
results:
- hypothesis:
class_id: ball
score: 0.8446003794670105
pose:
pose:
position:
x: 0.0
y: 0.0
z: 0.0
orientation:
x: 0.0
y: 0.0
z: 0.0
w: 1.0
---Here is the code for the object_follower_node written in Python.
import rclpy
import time
from rclpy.node import Node
from std_srvs.srv import Trigger
from sensor_msgs.msg import Range
from geometry_msgs.msg import Twist
from vision_msgs.msg import Detection2DArray
from rclpy.qos import QoSProfile, ReliabilityPolicy
class ObjectFollowerNode(Node):
def __init__(self):
super().__init__('object_follower_node')
self.declare_parameter('center_tolerance_px', 15)
self.declare_parameter('area_close_threshold', 4800)
self.declare_parameter('area_hysteresis', 0.85)
self.declare_parameter('obstacle_stop_distance', 0.18)
self.declare_parameter('linear_speed', 0.19)
self.declare_parameter('angular_speed', 1.2)
self.declare_parameter('slowdown_on_turn', True)
self.declare_parameter('smoothing_alpha', 0.4)
self.center_tol = self.get_parameter('center_tolerance_px').value
self.area_close = self.get_parameter('area_close_threshold').value
self.area_hyst = self.get_parameter('area_hysteresis').value
self.obstacle_dist = self.get_parameter('obstacle_stop_distance').value
self.base_linear = self.get_parameter('linear_speed').value
self.angular_k = self.get_parameter('angular_speed').value
self.slowdown = self.get_parameter('slowdown_on_turn').value
self.alpha = self.get_parameter('smoothing_alpha').value
self.image_center_x = 80.0
self.latest_range = 99.0
self.smoothed_area = 0.0
self.ball_detected = False
self.last_detection_time = 0.0
self.ball_cx = 80.0
self.state = "IDLE"
self.create_subscription(Detection2DArray, '/edge_impulse/detections',
self.detections_callback, 10)
qos = QoSProfile(depth=10, reliability=ReliabilityPolicy.BEST_EFFORT)
self.create_subscription(Range, '/front_range', self.range_callback, qos)
self.cmd_pub = self.create_publisher(Twist, '/cmd_vel', 10)
self.action_client = self.create_client(Trigger, '/mechdog/action')
while not self.action_client.wait_for_service(timeout_sec=1.0):
if not rclpy.ok():
break
self.create_timer(0.04, self.control_loop)
def detections_callback(self, msg):
if not msg.detections:
self.ball_detected = False
return
#self.get_logger().info(f"{msg.detections[0].results[0].hypothesis}")
det = max(msg.detections, key=lambda d: d.results[0].hypothesis.score if d.results else 0)
#self.get_logger().info(f"{det.results}")
if not det.results:
return
w = det.bbox.size_x
h = det.bbox.size_y
area = w * h
self.ball_cx = det.bbox.center.position.x
self.ball_detected = True
self.last_detection_time = time.time()
self.smoothed_area = self.alpha * area + (1.0 - self.alpha) * self.smoothed_area
self.get_logger().info(f"Area: {self.smoothed_area}")
def range_callback(self, msg):
if msg.min_range <= msg.range <= msg.max_range:
self.latest_range = msg.range
else:
self.latest_range = 99.0
#self.get_logger().info(f"Range: {self.latest_range}")
def control_loop(self):
self.get_logger().info(f"{self.state} {self.smoothed_area:0.1f}")
twist = Twist() # vx = 0 az = 0
if not self.ball_detected or (time.time() - self.last_detection_time > 0.5):
self.state = "IDLE"
self.cmd_pub.publish(twist)
return
error_x = self.ball_cx - self.image_center_x
error_norm = error_x / self.image_center_x
if self.latest_range < self.obstacle_dist:
if self.state != "OBSTACLE":
self.state = "OBSTACLE"
twist.linear.x = 0.0
twist.angular.z = 0.0
self.cmd_pub.publish(twist)
return
if self.smoothed_area > self.area_close:
if self.state != "BALL_CLOSE":
self.state = "BALL_CLOSE"
twist.linear.x = 0.0
twist.angular.z = 0.0
self.cmd_pub.publish(twist)
time.sleep(1)
self.trigger_action()
time.sleep(1)
return
self.state = "APPROACHING"
twist.angular.z = -self.angular_k * error_norm
twist.angular.z = max(-1.4, min(1.4, twist.angular.z))
speed = self.base_linear
if self.slowdown:
slow_factor = 1.0 - 0.7 * abs(error_norm)
speed *= max(0.35, slow_factor)
twist.linear.x = speed
self.cmd_pub.publish(twist)
def trigger_action(self):
#if self.action_client.wait_for_service(timeout_sec=0.5):
self.get_logger().info("trigger action")
self.action_client.call_async(Trigger.Request())
def destroy_node(self):
if rclpy.ok():
stop = Twist()
self.cmd_pub.publish(stop)
self.get_logger().info("Object detection node stopped")
super().destroy_node()
def main(args=None):
rclpy.init(args=args)
node = ObjectFollowerNode()
try:
rclpy.spin(node)
except KeyboardInterrupt:
pass
finally:
node.destroy_node()
rclpy.try_shutdown()
if __name__ == '__main__':
main()The complete ROS 2 workspace source code and vision/controller module firmware are available in the GitHub repository listed in the Code section.
Launch the applicationExecute the following commands to start micro-ROS agent.
$ source ~/ros2_jazzy/setup.sh
$ source ~/uros_ws/install/setup.sh
$ ros2 run micro_ros_agent micro_ros_agent udp4 --port 8888Execute the following commands in another terminal to start the object_follower_node,
$ source ~/ros2_jazzy/install/setup.sh
$ source ~/ws_mechdog/install/setup.sh
$ ros2 launch object_follower object_follower_node.launch.pyAnnotated images with detection bounding boxes are streamed via HTTP for live, real-time viewing using the web_video_server package.
Although not needed for core operation, to visualize the robot motion in RViz2, execute the following command on the Arduino Uno Q. It launches the robot_state_publisher and robot_localization nodes and publishes the /odom topic.
$ ros2 launch mechdog_description robot.launch.pyTo view the visualization in RViz2, use another computer on the same network with RViz2 installed.
$ rviz2The ROS 2 robot_localization package fuses data from the IMU and odometry estimated from commanded velocities (joint servos provide no position feedback).
A dedicated battery_node monitors the robot's 2S Li-ion battery voltage and outputs the reading to the Arduino Uno Q's LED matrix.
The LED matrix display is controlled by an STM32U585 microcontroller on the Arduino Uno Q. The corresponding Arduino sketch is provided below.
#include "ArduinoGraphics.h"
#include "Arduino_LED_Matrix.h"
ArduinoLEDMatrix matrix;
void setup() {
Serial1.begin(115200);
matrix.begin();
matrix.stroke(0xFFFFFFFF);
matrix.textFont(Font_5x7);
}
void loop() {
if (Serial1.available()) {
float volt = Serial1.parseFloat();
if (volt != 0) {
if (volt > 3.0f && volt < 9.0f) {
matrix.clear();
char s[4];
sprintf(s, "%0.1f", volt);
matrix.beginDraw();
matrix.beginText(0, 1, 0xFFFFFF);
matrix.println(s[0]);
matrix.endText();
matrix.beginText(4, 1, 0xFFFFFF);
matrix.println(s[1]);
matrix.endText();
matrix.beginText(8, 1, 0xFFFFFF);
matrix.println(s[2]);
matrix.endText();
matrix.endDraw();
delay(1000);
}
}
}
}Live DemoConclusionThe legged robot effectively performs real-time ball pursuit by combining onboard machine learning-based detection with responsive gait generation and control. The hybrid architecture (employing full ROS 2 on the primary processor and micro-ROS on peripheral microcontrollers) enables reliable, low-latency communication and modular software design despite severe hardware constraints. Deploying an Edge Impulse-optimized model directly on embedded hardware ensures fast and efficient object recognition, supporting fluid tracking and navigation in dynamic settings. This work establishes a reproducible, scalable framework for building sophisticated behaviors on compact platforms, while providing an accessible hands-on environment for studying mobile robotics. The approach opens pathways for practical deployment in areas requiring agile, perceptive robots, such as exploration tasks or interactive systems, and underscores the value of open-source tools in enabling advanced robotics on limited resources.
















Comments