Neural networks have been on the move — from the cloud to the edge, and all the way down to embedded devices. Running inference on large networks designed for non-trivial tasks can take a very considerable amount of compute time that can push computation back up the chain to the edge or even cloud. However, as processing moves further from the device, much data from on-board sensors tends to be discarded due to issues with bandwidth and cost. As such, optimizing networks to run efficiently on-device can be highly advantageous.
On-device inference has recently received a boost from work done by NXP Semiconductors. NXP has extended Facebook’s open-source Glow neural network compiler with hardware-specific optimizations. These Glow extensions target Arm Cortex-M cores and the Cadence Tensilica HiFi 4 DSP and include platform-specific optimizations for the i.MX RT series of microcontrollers, including the i.MX RT685, i.MX RT1050 and RT1060. These optimizations deliver a 2-3x performance improvement for these microcontrollers when compared with the standard version of Glow.
The beefy, yet low-cost and low-power, i.MX RT series of microcontrollers targeted for the optimizations range from 600MHz to 1GHz clock speeds. Running highly optimized neural network models on these chips offers many new possibilities for TinyML.
The new functionality is included in the free eIQ Machine Learning Software Development Environment, within NXP’s MCUXpresso SDK.