Setting the Tone in Computer Vision
This lightweight processing algorithm helps self-driving cars to stay safe by utilizing data from high dynamic range image sensors.
The dynamic range of a digital camera is not necessarily one of the specifications that most of us are concerned with. It quantifies the range of light intensities that an image sensor can capture, from the darkest shadows to the brightest highlights. Most modern digital cameras can capture a sufficiently wide range of light intensities so as to not be a concern for the average consumer. But when it comes to specialized applications, like self-driving cars, dynamic range can be of critical importance.
Consider, for example, a sudden flash from an oncoming headlight during the night, or the sun on the horizon in the morning. These situations can completely blind a standard image sensor, and that loss of information, even temporarily, can lead to disaster. High dynamic range (HDR) sensors have the potential to overcome these problems by accurately representing a much wider range of light intensities.
However, the data produced by these HDR sensors causes some problems of its own. The dense data that they produce cannot be displayed on a traditional monitor, for example. This issue is dealt with through a process called tone mapping, which seeks to adjust pixel intensities to improve the contrast and color tones of the final images, and make them compatible with existing displays and processing algorithms.
Lightweight tone mapping algorithms exist today, but they are known for introducing artifacts that are unacceptable for most applications. These artifacts can be eliminated with more computationally-intensive algorithms, but that increases the cost and size of the required hardware components. A team at Dream Chip Technologies GmbH has landed on a novel approach that hits the sweet spot between accuracy and computational complexity in tone mapping. Their proposed TinyML-based global tone mapping method, called TGTM, outperforms state-of-the-art methods on HDR camera images while requiring orders of magnitude less computations.
This was achieved by developing a computationally efficient convolutional neural network (CNN) that was designed to derive tone curve function parameters from image histograms. This CNN takes a 26-bit HDR image as the input and converts it to a 12-bit low dynamic range image that can be displayed on a traditional monitor and is compatible with existing downstream processing algorithms. The approach was specifically designed to be integrated into the image signal processor of a standard digital camera.
Since the neural network is trained via a supervised learning technique, a large number of input images, and the corresponding output images, were needed for the algorithm to learn from. This dataset would be very labor-intensive and expensive to produce, so the team instead devised a data simulation approach that allowed them to quickly produce synthetic data to train the model.
The performance of the pipeline was first assessed by testing it with simulated data and comparing the predictions with ground truth data. Overall, the predictions made by the model proved to be quite accurate, showing it to be a viable processing algorithm. Similar observations were made about the performance of the system when evaluating real images taken with an HDR camera.
The final neural network developed by the team had only 1,000 parameters, making it extremely small in the world of machine learning. Furthermore, it only required 9,000 floating point operations per second for execution, which makes the team’s goal of integrating it into a traditional image signal processor sound very reasonable.