Autonomous Navigation
Potential Challenges in Modern Autonomous Navigation Systems
Contribution: Virtual Guidance
Sim-to-Real Transferring of the Control Policy
Overview of the Proposed Modular Framework
Experiments, Summary, and Demonstration
Code Manual and Technical Report

Team Do You Wanna Build a Snowman:

•

•

•

•

•

•

Published March 11, 2020

Sim-to-Real: Virtual Guidance for Robot Navigation

We present an effective, easy-to-implement, and low-cost modular framework for completing complex navigation tasks.

AdvancedFull instructions provided20 hours12,122

Autonomous Machines & Robotics: 2nd Place

AI at the Edge Challenge

Things used in this project

Hardware components

NVIDIA Jetson Nano Developer Kit

NVIDIA Jetson AGX Xavier

ZED Stereo Camera

ZED is a stereo camera which is originally used for long-range depth perception. However, in our scenario, ZED acts like a normal camera. You can re-implement our project with other RGB cameras.

Clearpath Husky AGV

We use Clearpath Husky as our autonomous guided vehicle.

TP-Link TL-WR840N Wireless Router

GeChic 1102H On-Lap Portable Monitor

Software apps and online services

TensorFlow

Unity

Story

Introduction Video

Autonomous navigation has been attracting attention in recent years, especially for maneuvering and controlling autonomous guided vehicles (AGVs) which include self-driving cars and delivery robots.

Although autonomous systems are promising and are expected to have numerous potential applications, developing a practical one is not a straightforward and easy task. A visual-based autonomous navigation system typically requires four essential functionalities: visual perception, localization, navigation, and obstacle avoidance. In order to meet these requirements, conventional AGVs rely heavily on sensor inputs obtained from devices such as high-end cameras, LIDARs, GPSes, and other localization sensors. Although such devices provide solutions to the above requirements, they are mostly expensive and thus are unable to satisfy the need for mass production.

In this project, we propose to overcome the above challenge by using several state-of-the-art techniques to exclude the need for any high-end sensors. Only a single RGB camera mounted on the AGV is used as input. We embrace techniques such as semantic segmentation with deep neural networks (DNNs), simultaneous localization and mapping (SLAM), path planning algorithms, as well as deep reinforcement learning (DRL) to implement the four functionalities mentioned above, respectively. In our system, visual perception and localization are performed by our perception and localization modules using the image frames captured by the RGB camera. Navigation and obstacle avoidance, on the other hand, is performed by our planner module together with a DRL agent trained in virtual environments simulated by the Unity engine. In order to guide the DRL agent to navigate to the destination while avoiding obstacles, we introduce a new concept called “virtual guidance” in our autonomous system. Virtual guidance enables our DRL agent to be guided by the planner module by setting up intermediate virtual waypoints along the AGV’s path. The concept of virtual guidance is introduced in the following section. Our experimental results show that the proposed methodology is able to guide our AGV to pass through crowded environments, and navigate it to the destination with high success rates both in indoor and outdoor environments.

Contribution: Virtual Guidance

Virtual Guidance Scheme

We introduce “virtual guidance”, a simple but effective way to communicate the navigation path to our DRL agent (i.e., the AGV). Virtual guidance is designed as a virtual lure for the DRL agent. During the training phase, the DRL agent is trained to follow the virtual guides in our simulated environments so as to obtain rewards. During the deployment phase (i.e., in the real world), a virtual guide is rendered as a yellow ball on an appropriate location of the semantic segmentation generated by the perception module, such that our AGV is able to follow the path created by the planner module.

The rendered position of the virtual guide is dynamically adjusted by the planner module based on a pool of pre-defined candidate locations. If there exists an obstacle blocking the desired path to be followed, the virtual guide is temporarily de-rendered so as to allow the AGV to focus on obstacle avoidance. If the AGV is deviated from is expected direction after performing obstacle avoidance, the virtual guide is rendered back on the semantic segmentation to redirect it to the right path. Please refer to our technical report for more details.

Sim-to-Real Transferring of the Control Policy

In this project, we first train the DRL agent in our simulated environments, then transfer it to the real world to navigate our AGV. This approach is called “sim-to-real transferring”, which provides several benefits. First, it allows the DRL agent to collect experience in a fast and efficient manner (i.e., by interacting with the simulated environments), eliminating the requirement of time-consuming interactions with the real world. Second, it eliminates potential damage to the AGV as well as real-world objects during the training phase. Third, it enables parallel training, for multiple DRL agents can be trained concurrently in the simulated environments.

Although sim-to-real transferring provides these advantages, the reality gap between the virtual and real environments (e.g., the difference in lightening conditions, textures, etc.) poses challenges when transferring the DRL agent trained in the virtual world to the real world. In order to overcome the challenge posed by the reality gap, we employ semantic segmentation as the intermediate representation to bridge different modules in this project. The rationale is that even though the visual appearances of the images from virtual and real worlds may differ, their semantic segmentation can be similar (as illustrated in the above figure and explained in the technical report). This property allows the control policy of the DRL agent to be seamlessly transferred from the virtual world to the real world, as long as the DRL agent is trained to perform actions based on the observations of semantic segmentations during the training phase.

Overview of the Proposed Modular Framework

As illustrated in the above figure, our framework consists of four modules: a perception module, a localization module, a planner module, and a control policy module. The entire framework is implemented on an embedded cluster consisting of two NVIDIA Jetson Nanos and two NVIDIA Jetson Xaviers. The Jetson embedded devices are connected via a router and ethernet links. The communication of the four modules, as well as the peripheral components (e.g., the RGB camera and the AGV), are carried out via the robot operating system (ROS). A high-level description of the proposed framework is covered in the introduction video of this webpage. The implementation details of the framework are provided in the technical report.

Perception Module

The main function of the perception module is to translate input RGB image frames into semantic segmentations. In this project, the perception module is implemented as a deep convolutional neural network (DCNN) architecture “DeepLabv3”, which is a highly accurate model for generating semantic segmentations. However, please note that other semantic segmentation models can be used as the perception module in the proposed framework as well. In our system, the perception module is executed on an NVIDIA Jetson Xavier due to the relatively larger model size and the demand for higher computational power.

Localization Module

The function of the localization module is to generate the current “pose” of the AGV according to the features captured from the input RGB frames. In this project, we embrace ORB-SLAM2 as our localization module. ORB-SLAM2 is a highly accurate visual SLAM algorithm that requires only RGB inputs from a single monocular camera. The localization module is executed on an NVIDIA Jetson Xavier due to its larger memory size and the higher computational power.

Planner Module

The function of the planner module is to derive a path according to the AGV’s current pose and its final destination based on the famous A* algorithm. This module is also responsible for rendering the “virtual guide” on the semantic segmentation generated by the perception module based on its derived path. The planner module is executed on an NVIDIA Jetson Nano, which also serves as a node of ROS for coordinating the entire embedded cluster.

Control Policy Module

The control policy module acts as a local planner in our framework (while the planner module serves as a high-level planner). The control policy module is implemented as a DRL agent and is mainly developed to concentrate on avoiding obstacles and following the virtual guide, so as to lead the AGV to its final destination. During the training phase, we employ the famous Proximal Policy Optimization (PPO) algorithm to train the DRL agent in our virtual worlds simulated with Unity. During the deployment phase (i.e., in the real world), the control policy module (i.e., the DRL agent) takes the semantic segmentation rendered with the proposed “virtual guide” to navigate the AGV to its destination. The control policy module is executed on an NVIDIA Jetson Nano board.

Experiments, Summary, and Demonstration

In order to validate the proposed framework, we perform experiments in simulated environments as well as real-world indoor and outdoor environments. The AGV is tested in various scenarios with different navigation routes and dynamic objects. The evaluation results suggest that the proposed framework is effective in guiding the AGV to its destinations with high success rates. The detailed experimental setups and the summary of the performed experiments are provided in the technical report. Overall, we proposed an effective, easy-to-implement, and low-cost framework. With just a single camera and a few edge-computing devices, we make autonomous navigation more realistic and affordable. We believe that this project opens a new avenue for future researches in vision-based autonomous navigation. The demonstration video of our work is as follows.

Demonstration Video

Code Manual and Technical Report

- Technical Report

- Code Manual

- Github

AGV

The figure below illustrates the computer-aided design (CAD) of our AGV system. The robot base is an AGV from Clearpath Robotics Inc. The AGV takes action commands from the control policy module of the proposed framework discussed above. On top of the AGV are the RGB camera, the monitor, and the embedded cluster including two NVIDIA Jetson Nano boards and two NVIDIA Jetson Xavier boards. A router and ethernet links are used to connect the components of the AGV system. Please note that although the RGB camera is a stereo camera “Zed”, its depth estimation feature is not used in this project. We select this camera because it supports ROS and can thus be easily integrated into our system.