Toward Cross-Layer Energy Optimizations in AI Systems
Jae-Won Chung, Nishil Talati, Mosharaf Chowdhury
TL;DR
The paper addresses the escalating energy footprint of AI systems, especially GenAI, and proposes cross-layer energy optimization spanning software and hardware. It introduces a narrow-waist interface and the time-energy Pareto frontier as core abstractions to decouple optimization while preserving end-to-end energy considerations. The contributions include a Pareto-frontier–based framework for end-to-end energy modeling, optimization approaches, and what-if simulations, enabling hardware and software to co-develop more energy-efficient AI stacks. The work highlights the practical significance of scalable, energy-efficient AI across heterogeneous accelerators and the critical role of cross-layer design in addressing power delivery and operating-cost challenges in modern data centers.
Abstract
The "AI for Science, Energy, and Security" report from DOE outlines a significant focus on developing and optimizing artificial intelligence workflows for a foundational impact on a broad range of DOE missions. With the pervasive usage of artificial intelligence (AI) and machine learning (ML) tools and techniques, their energy efficiency is likely to become the gating factor toward adoption. This is because generative AI (GenAI) models are massive energy hogs: for instance, training a 200-billion parameter large language model (LLM) at Amazon is estimated to have taken 11.9 GWh, which is enough to power more than a thousand average U.S. households for a year. Inference consumes even more energy, because a model trained once serve millions. Given this scale, high energy efficiency is key to addressing the power delivery problem of constructing and operating new supercomputers and datacenters specialized for AI workloads. In that regard, we outline software- and architecture-level research challenges and opportunities, setting the stage for creating cross-layer energy optimizations in AI systems.
