Ambarella Promises High-Efficiency Generative AI, LLMs with the New N1 System-on-Chip

The new N1 can run the Llama2-13B at 25 output tokens per second in a mere 50W, the company promises.

Edge artificial intelligence (edge AI) specialist Ambarella has announced a new system-on-chip which, it claims, can deliver on-device generative AI multi-modal large language models (LLMs) "at a fraction of the power" required by rival graphics processor based systems: the Ambarella N1.

"Generative AI networks are enabling new functions across our target application markets that were just not possible before," explains Les Kohn, co-founder and chief technical officer of Ambarella's latest chip design, which aims at server-class performance. "All edge devices are about to get a lot smarter, with our N1 series of SoCs enabling world-class multi-modal LLM processing in a very attractive power/price envelope."

Ambarella is looking to tame the energy demands of generative AI, promising a chip that can run popular LLMs in a 50W power envelope. (📷: Ambarella)

The N1 system-on-chip family is built, the company explains, around its CV3-HD architecture — originally designed for autonomous driving systems but here put to the job of running large language models (LLMs), like those underpinning popular chat-bot services like OpenAI's ChatGPT or Google's Bard.

Where LLMs are typically run on GPU-based accelerators which consume hundreds of watts of power, though, Ambarella says it can run the same models on-device far more efficiently — with the N1 drawing just 50W to run the Llama2-13B LLM model with a performance of 25 output tokens per second. It's enough, the company says, for the all-in-one chip to be used to drive workloads including contextual searches of video footage, robot control through natural language commands, and "AI helpers" for everything from image generation to code generation.

The new N1 builds on the company's CV3-HD platform, originally built for autonomous vehicles. (📷: Ambarella)

In addition to support in its Cooper Developer Platform, Ambarella has ported a range of popular models to the N1 — including Llama-2 and the Large Language and Video Assistant (LLava,) which is said to offer multi-modal vision analysis for up to 32 camera sources when running on the N1.

The company is showcasing the N1 SoC and its LLM capabilities at the Consumer Electronics Show (CES 2024) in Las Vegas this week; pricing and availability have not yet been confirmed. More information is available on the Ambarella website.

machine learning

artificial intelligence

computer vision

Gareth Halfacree

Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.

Ambarella Promises High-Efficiency Generative AI, LLMs with the New N1 System-on-Chip

The new N1 can run the Llama2-13B at 25 output tokens per second in a mere 50W, the company promises.

Latest articles

Sponsored articles

Related articles

Latest articles

Related articles