Kneron Claims Its KL1140 Can Bring 120B-Parameter LLMs On-Device at One-Tenth the Cost of GPUs

Next-generation neural chips can outperform leading GPU-based accelerators while drawing as little as one-third the power, Kneron says.

Edge artificial intelligence (AI) specialist Kneron has announced its fourth-generation neural coprocessor, the KL1140 β€” which, it claims, can deliver equivalent performance for large language models (LLMs) running on a high-end GPU while drawing as little as one-third the power and at a tenth of the hardware cost.

"The twin threat of high costs and vast energy consumption means the status quo of AI computing is fundamentally unsustainable," claims Kneron founder and chief executive officer Albert Liu in support of the company's latest hardware. "The KL1140 is our response to the challenges of scaling LLMs in the cloud alone. By running advanced models at the edge, we're achieving a technical milestone that opens up entirely new applications for everyday devices, putting the power of LLMs directly into the hands of users."

Large language models, which take vast input corpuses and turn them into tokens that can be statistically selected for output in response to input tokens in a way that simulates a conversation, question-and-answer, or even reasoning capabilities, are at the heart of the current AI boom. As they get more capable, though, they also require increasing amounts of compute β€” with gigawatts of new data centers needing to come online in the next few years just to keep up with demand.

The Mamba-compatible KL1140, which Kneron teased last year with the promise of a 2025 launch, is designed to address this problem. Four KL1140 chips working in parallel, the company claims, can run a large language model sized at up to 120 billion parameters on-device rather than in the cloud β€” while drawing between one-half and one-third as much power as a competing NVIDIA or AMD GPU-based accelerator and at one-tenth of the hardware cost.

Kneron is positioning the chip as not only a means of addressing the power demand of LLM-based AI services but of bringing these services out of the cloud and onto local devices, suggesting uses cases including a self-contained security robot that can interpret natural-language commands and "recognize complex situations" without a connection to the internet, in-car decision-making, private edge AI assistants that do not need to transmit data to remote cloud servers, and even smart manufacturing systems.

"The arrival of the KL1140 is more than just another chip launch, it’s a tipping point in the journey towards practical, high-performance and sustainable AI," Liu claims. "By bringing intelligence to the edge, we're enabling developers and enterprises to create applications that were impossible before."

At the time of writing, Kneron had not publicly disclosed any technical details nor pricing for the new KL1140 chips β€” but does say that its energy efficiency claims are backed by independent benchmarking carried out at the University of California Berkeley.

More information is available on the Kneron website.

Gareth Halfacree
Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles