Object recognition is a critical piece of many machine learning applications. Whether the goal is to create an autonomous car, a warehouse robot, or a package delivery drone, in each case, the devices must be capable of recognizing the objects that are around them. There are many proven models that classify a wide range of objects with a high degree of accuracy; however, these models do not always perform as expected under real world scenarios.
One reason these models can fall flat is that different objects may be indistinguishable from one another at certain angles. For example, the flavor of a certain type of packaged food may only be visible on one side of the box. If looking anywhere else, it is not possible to clearly identify what is in view. To get around problems such as this, researchers at the University of Genoa have developed a new technique that allows machines to get a better view of their surroundings and to recognize objects without ambiguity.
To accomplish this, the team developed an algorithm to compute an ambiguity rank. This metric was designed to determine the similarity of a pair of images. By calculating this value, the team was able to remove ambiguous images from a set of training data, such that a classification model could be trained solely on non-ambiguous example images. While this solves one problem — preventing a model from failing to converge due to identical images with different labels — it does not prevent a device in the wild from encountering an ambiguous view of an object.
To deal with this next problem, the team used the same ambiguity rank metric. In this case, if the measurement is above a certain threshold, the camera is moved to capture the object from a different angle. By doing so, a non-ambiguous image of the object can be captured, and it can then be accurately classified.
A number of trials were conducted to validate the system. In one case, an online active vision test was performed, in which a robot arm would inspect an object from a random initial starting viewpoint. If the ambiguity rank was above a preset threshold, the robot would adjust the camera to capture the object from another angle. Of the 230 tests of this sort, 210 resulted in a successful classification, placing the accuracy at better than 91%.
The entire pipeline is available as a Python package for anyone that is interested in incorporating it into their own projects.