A New Dimension in Computer Vision
MonoXiver improves the accuracy of 3D object detection in 2D images, making low-cost systems possible for safety-critical applications.
For many applications, like self-driving vehicles, autonomous drones, and industrial robots, it is essential that the system gains a clear understanding of the environment in which it finds itself. This understanding extends beyond merely recognizing the presence of objects; it requires a comprehension of their three-dimensional spatial layout. Three-dimensional object localization and mapping play a pivotal role in achieving this level of environmental awareness. By accurately determining the location and orientation of objects in three-dimensional space, these technologies empower autonomous systems to navigate complex terrains, make informed decisions, and execute tasks with precision and safety.
Whether it is a self-driving car avoiding collisions with pedestrians, a drone maneuvering through a cluttered urban landscape, or a robot manipulating objects in a manufacturing facility, the ability to locate and interact with objects in three-dimensional space is the linchpin for their successful deployment in real-world scenarios. However, the technologies that enable three-dimensional object detection, like LiDAR, can be prohibitively expensive for many use cases.
Accordingly, less expensive, traditional two-dimensional cameras are often used for this purpose. Of course two-dimensional cameras do not provide the needed three-dimensional information, so a number of techniques have been developed to infer the positions of objects in three-dimensional space. While many advances have been made, and these methods often work quite well, they still leave much to be desired. It is common to find that existing algorithms fail to include portions of detected objects, for example. As such, they fall short of the reliability that is demanded of safety-critical applications.
A collaborative effort led by researchers at North Carolina State University has resulted in the development of a new method to extract three-dimensional object locations from two-dimensional images. By taking a multi-step approach to the problem, the team has shown that their algorithm can not only locate objects in space, but it can also detect the full extent of each object — even when it has a complex or irregular shape. And importantly, the algorithm is very lightweight, which makes it useful for real-time computer vision applications.
Commonly, the starting point for inferring three-dimensional object locations from image data is drawing bounding boxes around each object. This information helps the algorithm determine important information, like the size of the object and how far away it is. But unfortunately, existing algorithms frequently miss portions of the object when they draw these boxes, which in turn leads to errors when making downstream calculations.
The team’s new method, called MonoXiver, uses the same bounding boxes as a starting point, but then performs a secondary analysis. In this next step, the area immediately surrounding each bounding box is explored. The algorithm examines the geometry and color of the surrounding areas to see if they are likely to be a part of the object, or irrelevant background data. In this way, the precise location of the object can be determined.
This additional processing does add some overhead, naturally, but it is within reason for real-time applications. Using their test setup, the researchers found that they could detect object bounding boxes at 55 frames per second. When adding the additional step, that rate was trimmed to 40 frames per second, which is still acceptable for most use cases.
Several experiments were conducted using the well-known KITTI and Waymo datasets. In conjunction with three other leading approaches for extracting three-dimensional object locations from images, the addition of MonoXiver significantly improved performance in all cases. Encouraged by these results, the team is presently working to further improve the performance of their tool. They hope to see it put to use in many applications, like self-driving cars, in the future.