It’s About Time!
MovieNet, an AI algorithm inspired by the human brain, efficiently analyzes video streams to understand how objects change over time.
Before computer vision systems can really understand the world around them, they will need to learn to process visual data in new ways. Existing tools generally focus on the individual frames of a video stream, where they may, for example, locate objects of interest. As useful as this capability is for numerous applications, it leaves out a mountain of crucial information. Understanding each frame in isolation misses important features, like how an object moves over time. And without that knowledge, artificial systems will continue to struggle in understanding things like how objects change over time and interact with one another.
In contrast to today’s artificial intelligence models, the human brain has no difficulty in understanding how scenes unfold over time. This inspired a pair of researchers at the Scripps Research Institute to build a novel computer vision system that works more like a human brain. Their approach, called MovieNet, is capable of understanding complex and changing scenes, which could be important to the future development of tools in the areas of medical diagnostics, self-driving vehicles, and beyond.
This breakthrough was achieved by studying the neurons in a visual processing region of the tadpole brain known as the optic tectum, which are known to be adept at detecting and responding to moving stimuli. As it turns out, these neurons interpret visual stimuli in short sequences, typically 100 to 600 milliseconds long, and assemble them into coherent, flowing scenes. Each neuron specializes in detecting specific patterns, such as shifts in brightness, rotations, or movements, which are akin to individual puzzle pieces of a larger visual narrative.
By studying how these neurons encode information, the researchers created a machine learning algorithm that replicates this process. MovieNet breaks down video clips into essential visual cues, encoding them into compact, interpretable data sequences. This allows the model to focus on the critical aspects of motion and change over time, much like the brain does. Additionally, the algorithm incorporates a hierarchical processing structure, tuning itself to recognize temporal patterns and sequences with exceptional efficiency. This design not only allows MovieNet to identify subtle differences in dynamic scenes but also compresses data effectively, reducing computational requirements while maintaining high accuracy.
After applying these biological principles, it was found that MovieNet could transform complex visual information into manageable, brain-like representations, enabling it to excel in real-world tasks that require a detailed understanding of motion and change. When tested with video clips of tadpoles swimming under a variety of conditions, MovieNet outperformed both human observers and leading AI models, achieving an accuracy of 82.3 percent — a significant improvement over Google’s GoogLeNet, which reached only 72 percent accuracy despite being a more computationally-intensive algorithm that was trained on a much larger dataset.
The team’s innovative approach makes MovieNet more environmentally sustainable than traditional AI, as it reduces the need for extensive data and processing power. Its ability to emulate brain-like efficiency positions it as an important tool across various fields, including medicine and drug screening. For instance, MovieNet might one day identify early signs of neurodegenerative diseases by detecting subtle motor changes or track cellular responses in real-time during drug testing, areas where current methods often fall short.
R&D, creativity, and building the next big thing you never knew you wanted are my specialties.