Power-hungry AI: Researchers evaluate energy consumption across models

A new tool designed by researchers at the University of Michigan allows users to compare the energy efficiency of AI-powered language models.
Photo of the Michigan Academic Computing Center (MACC) with a long hallway with data servers on either side enclosed with a glass door with a block M on it.
The Michigan Academic Computing Center (MACC) hosts data for University of Michigan units and affiliates.

The tech industry is a major and growing contributor to global energy consumption. Data centers, in particular, are responsible for an estimated 2% of electricity use in the U.S., consuming up to 50 times more energy than an average commercial building, and that number is only trending up as increasingly popular large language models (LLMs) become connected to data centers and eat up huge amounts of data. Based on current datacenter investment trends, LLMs could emit the equivalent of five billion U.S. cross-country flights in one year.

In response to this issue, a team of researchers at the University of Michigan has developed the ML.ENERGY Leaderboard, which evaluates and ranks open-source LLMs according to how much energy they consume. Growing out of the team’s prior development of Zeus, a framework for measuring and optimizing the energy consumption of deep learning models used to power modern AI, the ML.ENERGY Leaderboard aims to supply developers and end users alike with a holistic view of how much energy LLMs consume, allowing developers and users to factor in any given application’s energy demands when deciding to use it.

“Recent years have seen a huge rise in large language models, but the conversation has revolved almost exclusively around performance,” said Jae-Won Chung, doctoral student in computer science and engineering at U-M. “No one is really thinking about energy consumption.”

The information that has emerged surrounding AI’s energy use has been incomplete at best. The ML.ENERGY Leaderboard is the first known initiative to dive deeper into the dynamics influencing energy use among open-source LLMs and identify which models are consuming more or less energy and why.

“This is the first analysis of its kind,” said Mosharaf Chowdhury, Morris Wellman Faculty Development Professor of Computer Science and Engineering at U-M. “The literature that does exist on AI energy consumption is either anecdotal or based on very basic calculations. No one has tried to drill down to the specifics in a systematic and principled way to determine how much energy AI is consuming.”

A scatter plot from the leaderboard showing the energy use of the LLMs on the Y axis and response length on the X axis. The graph shows that, in general, energy consumption increases along with response length.
The ML.ENERGY Leaderboard allows users to select various inputs by which to evaluate and compare LLMs and their energy consumption. The above graph, for instance, shows where various LLMs rank when comparing energy use against response length.

Accompanying the Leaderboard is an interactive, online serving system that gives users a blind, side-by-side comparison of the content generated by two different LLMs in response. After deciding which content they like more, the user is given information about their relative energy consumption and then has the opportunity to revise their choice. The team hopes that the tool will serve as a guide for those seeking to balance performance and energy consumption in their choice of LLMs, ultimately promoting the use of more energy efficient models, while also providing information on how users choose LLM models.

“The objective of the serving system is to learn about how people perceive large language models,” said Jiachen (Amber) Liu, doctoral student in computer science and engineering at U-M. “We want to parse how users balance response quality and overall performance with energy considerations, and how much that affects their choice.”

A side-by-side comparison of two anonymous LLMs from the Leaderboard's interactive component. The user has asked the two models to "Write a song that welcomes new students to the University of Michigan," and both models have generated a response.
In the interactive “Colosseum” feature of the Leaderboard, the user gives a prompt and two anonymous LLMs generate responses. The user then selects which response was better.
The same image as above, but with the energy consumption and model names revealed.
After selecting their response, the relative energy consumption of the two models as well as the names of the models are revealed. The user can then decide whether this increased energy consumption is “worth” the better response, or whether the lower-consumption response is adequate.

The information gleaned from the Leaderboard is useful to multiple stakeholders, whether it is the developer creating or modifying the model, the civil engineer designing and building the data center building, or the end user. Considering the tremendous energy burden data centers face, any increase in energy efficiency translates into significant cost savings. This trickles down to everyday people as well.

“This initiative goes far, far beyond computer science,” said Chowdhury. “We not only want to collect the numbers but also build the tools so that anyone can use and incorporate this software with their existing systems.”

The ML.ENERGY Leaderboard is part of a broader initiative spearheaded by the SymbioticLab at U-M. In addition to Chowdhury, Chung, and Liu, the Leaderboard team also includes Zhiyu (Julius) Wu and Ding (Eric) Ding, undergraduate students in computer science at U-M.