Arm Announces Updated AI and Deep Learning Framework for IoT Hardware

Disclaimer: Opinions expressed here are my own and not my employer’s. I work for Arm!

Disclaimer: Opinions expressed here are my own and not my employer’s. I work for Arm!

An OpenROV kit which relies on computer vision for underwater navigation. Small robots like this will increasingly become self-reliant and autonomous, generally powered by low-cost Arm-based SoCs.

This week, Arm announced an update to Compute Library which includes a variety of new functions to help AI developers make better use of Arm compute hardware. Announced earlier in the year, Compute Library provides embedded developers with a toolkit for building AI-enabled low-cost IoT devices, I wanted to take a minute to talk about why this is an interesting development for hardware developers and makers.

Compute Library Enables AI for All

AI, perceptual computing, deep learning on hardware devices have become an increasingly hot topic as tech co’s seeking a competitive edge in next-generation IoT use-cases have poured billions into new and more exotic forms of silicon.

While people may be familiar with ASICs, GPUs and FPGAs for deep learning and AI-related tasks, what many don’t realize is that a sizable amount of AI-enabling hardware on the market comes in the form of GPUs packaged on mobile SoCs. Arm Mali GPUs, for example, shipped in nearly 1 billion devices last year (2016), mostly mobile phones.

These underwater rovers produced by OpenROV use Arm-based components to help navigate underwater environments.

The picture becomes more interesting once you realize that these same Mobile SoCs (which are initially produced for consumption by handset makers) will ultimately make their way down the value chain and flow out into IoT, creating a tremendous “long-tail” of low-cost devices capable of AI calculations.

This is where Arm Compute Library comes into the picture. What Compute Library does is optimize and accelerate a variety of common, useful algorithms and functions frequently used in computer vision for Arm architectures to help enable this long tail of gadgets.

Here are a few highlights of what is contained in the latest update:

OpenCL C (targeting Mali GPUs):

Bounded ReLu

Depth wise convolution (used in mobileNet)

De-quantization

Direct convolution 1x1

Direct convolution 3x3

Direct convolution 5x5

Flattening for 3D tensor

Floor

Global pooling (used in SqueezeNet)

Leaky ReLu

Quantization

Reduction operations

ROI pooling

CPU (NEON):

Bounded ReLu

Direct convolution 5x5

De-quantization

Floor

Leaky ReLu

Quantization

New functions with fixed point acceleration

Also announced were a series of micro-architecture optimizations:

When we started the Compute Library project, our primary purpose was to share a comprehensive set of low level functions for computer vision and machine learning that provided good performance — but most importantly that was reliable and portable. The library is there to reduce cost and time efforts by developers and partners targeting Arm processors, whilst at the same time, also to behave well across the many system configurations that our partners implement. This is why we chose to use NEONintrinsics and OpenCL C as the target languages. However, there are cases where it is critical to extract every ounce of performance from the hardware. We therefore looked at adding to the library low-level primitives optimised using hand-coded assembly tailored to the micro-architecture of the target CPU.

If any of these areas interest you, take a look at the blog post and learn more.

Arm Announces Updated AI and Deep Learning Framework for IoT Hardware

Disclaimer: Opinions expressed here are my own and not my employer’s. I work for Arm!

Latest articles

Sponsored articles

Related articles

Latest articles

Related articles