Lighten Up!

AI models may be getting bigger, but a new visual tracker called HiT delivers top-tier performance without the huge footprint.

Nick Bild
5 months agoAI & Machine Learning
Visual tracking algorithms in action (📷: B. Kang et al.)

The current AI summer is scorching hot, and that has got everyone’s expectations running high. There is a feeling that major innovations, like artificial general intelligence, might be right around the corner — even if, in reality, it is much more likely that they are still many years away. This excitement has also gripped researchers in the field that are scrambling to meet people’s lofty expectations while the summer sun continues to shine.

Building the next big thing involves moving fast and creating bigger and better things all the time. When your latest model already draws as much power as a small town, what does it matter if you add a few measly billion more parameters to it? If it performs better, that is all that matters, right? Strike while the iron is hot, or be a footnote in tomorrow’s history books!

This prevailing attitude is causing the field to advance by leaps and bounds, so in some ways, it would be hard to argue against it. But we must not forget that there is also room for optimization of the latest algorithms. It might not be as glamorous of a job, but if no one can actually run the models because of their extravagant requirements for computational resources, they will be limited in their real-world impacts.

A team at Dalian University of Technology recognizes the importance of shrinking the hardware requirements of top-tier models, so they have put transformer-based visual trackers in their sights. These algorithms are essential for everything from autonomous driving to robotic vision, so they are very important in the world of technology. But they are also among the biggest resource hogs, which means actually running them onboard a robot or vehicle at a reasonable frame rate is a big challenge.

To address this, the researchers developed HiT, a family of efficient visual trackers that maintain strong performance while dramatically improving speed and computational efficiency. The key innovation behind HiT lies in its Bridge Module, which fuses high-level semantic information with low-level fine-grained details. This helps compensate for the loss of spatial resolution commonly caused by high-stride downsampling in lightweight transformer backbones. Additionally, HiT incorporates a novel dual-image position encoding technique that simultaneously encodes the positional information of both the target object (template) and the surrounding scene (search area), enabling more accurate tracking.

Running on the NVIDIA Jetson AGX platform, HiT runs at an impressive 61 frames per second (fps) while securing a competitive 64.6% AUC score on the LaSOT benchmark. These results outpace all prior efficient visual trackers.

The team also introduced DyHiT, a dynamic tracker that smartly adapts its computational strategy based on the complexity of each scene. Using a lightweight feature-driven router, DyHiT determines whether a fast, shallow processing route is sufficient or if deeper, more complex analysis is needed. This divide-and-conquer method conserves computational resources in simple scenarios while retaining high accuracy for complex ones.

The fastest DyHiT variant clocks in at a blazing 111 fps on the same Jetson hardware, with only a minor dip in AUC to 62.4%. This balance between speed and performance is a major leap forward for deploying AI in real-world environments where power and processing budgets are tight.

Beyond these new models, the team also devised a training-free acceleration technique that turbocharges existing high-performance trackers. By integrating DyHiT’s efficient routing mechanism, popular trackers like SeqTrack-B256 can now run up to 2.7 times faster without sacrificing accuracy. This clever plug-in approach allows developers to squeeze more out of their existing models without needing costly retraining or architectural overhauls. Taken together, these advances may make high-performance AI more accessible and practical in the near future.

Nick Bild
R&D, creativity, and building the next big thing you never knew you wanted are my specialties.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles