CarbonCall Aims to Make On-Device Large Language Models Greener, Faster, More Energy-Efficient
A carbon-aware function calling system, CarbonCall dynamically switches models and hardware power envelopes to minimize carbon footprint.
Researchers from Southern Illinois University and the University of Texas at Austin are trying to do something about the growing power needs of on-device large language model (LLM) operation — by making the models aware of their carbon footprint.
"Large language models (LLMs) enable real-time function calling in edge AI [Artificial Intelligence] systems but introduce significant computational overhead, leading to high power consumption and carbon emissions," the team explains of the problem it set out to solve. "Existing methods optimize for performance while neglecting sustainability, making them inefficient for energy-constrained environments. We introduce CarbonCall, a sustainability-aware function-calling framework that integrates dynamic tool selection, carbon-aware execution, and quantized LLM adaptation."
The current AI boom is driven almost entirely by large language model (LLM) technology, in which inputs are converted into tokens and used to generate a string of the most statistically-likely output tokens in response — typically presented to the user as a "conversation" with a chatbot, though also applicable to audio, image, and video inputs. It's an impressive trick, but one thatrelies on misuse of vast troves of copyright data for training, results in something shaped like an answer rather than an actual trustworthy answer, and — possibly most importantly of all — is horrifyingly computationally expensive, and thus environmentally damaging.
CarbonCall is the team's attempt to address that latter problem — without making any of the others worse. Running on existing edge-AI hardware — tested on the NVIDIA Jetson AGX Orin computer-on-module, designed for high-performance embedded edge computing — CarbonCall dynamically adjusts the power envelope of the hardware on which it runs and switches between different variants of the current model, including quantized versions that require less resources, based on real-time forecasting of the carbon intensity of the device's current power source.
The researchers claim that this "carbon-aware execution strategy" delivers significant improvements: in testing, the carbon emissions of a large language model running on the Jetson AGX Orin were reduced by up to 52 percent, overall power consumption was reduced to 30 percent — and the overall execution time was improved by another 30 percent, without damaging the efficiency of the model.
"By combining dynamic tool selection, carbon-aware execution, and quantized LLM adaptation, CarbonCall minimizes power consumption and emissions while preserving response speed," the researchers say. "Compared to existing methods, CarbonCall achieves higher energy efficiency, making it a practical solution for sustainable agentic AI at the edge."
The team's work is available in a preprint on Cornell's arXiv server, under open access terms.
Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.