It’s All the Same to HTVM

HTVM simplifies and optimizes the deployment of DNNs to heterogeneous tinyML platforms with different CPUs and AI hardware accelerators.

Nick Bild
10 months agoMachine Learning & AI
The DIANA architecture used in benchmarking HTVM (📷: J. Van Delm et al.)

Artificial intelligence (AI)-powered smart applications are increasingly moving to an edge computing paradigm in which processing takes place either on- or near-device. By removing the dependence on cloud-based computing resources, these applications benefit from enhanced security and privacy, and also have lower latency, leading to greater responsiveness. As such, hardware manufacturers are introducing new on-device hardware AI accelerators at a rapid clip to support edge computing use cases. These chips have proven to be quite useful as they generally significantly improve inference speeds while simultaneously reducing energy use.

These accelerators have a wide variety of architectures. Some support one set of deep neural network (DNN) operators, while another chip supports a different set. Bit precision, data layout, memory capacity, and many other parameters vary wildly from accelerator to accelerator. This does mean that there are many options available to developers, which is good, however, it also makes deployment of AI models very challenging. As the number of platforms increases, supporting them quickly becomes a nightmare for developers.

A team led by researchers at KU Leuven in Belgium is trying to make the deployment process simpler with a tool that they call HTVM. It was designed specifically to make the deployment of DNNs simpler and more efficient on heterogeneous tinyML platforms. The HTVM toolchain handles the details associated with deploying to platforms with microcontroller cores, a variety of hardware accelerators, and differing memory architectures.

HTVM works by extending the TVM compilation process with a memory-planning backend called DORY. This backend generates code that optimizes data movement within the hardware, making the best use of the limited memory available on these tiny devices. By focusing on how DNN layers are tiled — divided and processed in smaller parts — HTVM ensures that even large layers can be executed efficiently on memory-constrained devices, resulting in significant speed improvements.

HTVM has been extensively tested and benchmarked on a platform called DIANA, which includes both digital and analog DNN accelerators. The tests showed substantial speed-ups and performance close to the theoretical maximum of the hardware. HTVM also allows entire networks to be deployed, reducing reliance on the main CPU and thereby decreasing overall processing time.

The toolchain's code is open source so that other developers can use and contribute to it. The GitHub repository has build instructions, and even a Docker image to make the initial setup as easy as possible. Be sure to take a look if you want to deploy a machine learning model to a complex tinyML platform.

Nick Bild
R&D, creativity, and building the next big thing you never knew you wanted are my specialties.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles