CEVA's NeuPro-M Boasts a 5-15x Performance Gain Over Last-Generation Edge AI Parts

Packing a range of coprocessors, all designed with edge AI in mind, CEVA's new NeuPro-M comes with impressive performance claims.

Gareth Halfacree
2 years ago β€’ Machine Learning & AI

CEVA has announced a next-generation processor architecture, dubbed NeuPro-M, with which it aims to boost edge AI and edge compute workloads β€” claiming a 5-15x performance gain over its last-generation equivalent, workload depending.

"The artificial intelligence and machine learning processing requirements of edge AI and edge compute are growing at an incredible rate," says CEVA's Ran Snir. "With the power budget remaining the same for these devices, we need to find new and innovative methods of utilizing AI at the edge in these increasingly sophisticated systems. NeuPro-M is designed on the back of our extensive experience deploying AI processors and accelerators in millions of devices, from drones to security cameras, smartphones and automotive systems."

The NeuPro-M blends a range of function-specific coprocessors with load balancing mechanisms designed to improve data flow β€” a key part of its claimed performance gain. At its heart is a main grid array with 4,000 multiply and accumulates (MACs) offering a mixed precision of 2-16 bits, a Winograd transform engine designed to cut convolution time in half and process at eight bits with less than a 0.5 percent drop in precision, a sparsity engine which dodges zero-value weight or activation operations for up to a fourfold performance gain, a programmable vector processing unit supporting "all data types" from 32-bit floats down to two-bit binary neural networks (BNNs), built-in weight and data compression down to two bits for storage and with real-time decompression, and a two-level memory architecture designed to lower power consumption.

In other words: The NeuPro-M is tailor-made for machine learning and artificial intelligence operations, both on existing networks and novel workloads. Between the Winograd transform engine, sparsity engine, and the use of low-resolution 4x4-bit activations, the company claims it offers a threefold reduction in cycle counts for popular networks including ResNet50 and YoloV3. An offline compression tool, meanwhile, is claimed to boost the performance-per-watt of the pats by a factor of five to 10 for common benchmark workloads β€” with, CEVA says, "very minimal impact on accuracy."

The company is launching the architecture with two preconfigured cores: The NPM11, a single NeuPro-M delivering up to 20 TOPS at 1.25GHz; and the NMP18, which combines eight NeuPro-Ms to deliver up to 160 TOPS at the same clock speed. Power efficiency, meanwhile, is stated at "up to 24 TOPS per watt," measured on a run through of the ResNet50 network β€” during which time a single NMP11 showed a fivefold performance boost and a sixfold reduction in memory bandwidth compared to the company's last-generation equivalent, CEVA claims.

The NeuPro-M is now being licensed to "lead customers," CEVA has confirmed, and will be opened more generally in the second quarter of the year. More information is available on the company's website.

Gareth Halfacree
Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles