Table of Contents
Fetching ...

MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from Microwatts to Megawatts for Sustainable AI

Arya Tschand, Arun Tejusve Raghunath Rajan, Sachin Idgunji, Anirban Ghosh, Jeremy Holleman, Csaba Kiraly, Pawan Ambalkar, Ritika Borkar, Ramesh Chukka, Trevor Cockrell, Oliver Curtis, Grigori Fursin, Miro Hodak, Hiwot Kassa, Anton Lokhmotov, Dejan Miskovic, Yuechao Pan, Manu Prasad Manmathan, Liz Raymond, Tom St. John, Arjun Suresh, Rowan Taubitz, Sean Zhan, Scott Wasson, David Kanter, Vijay Janapa Reddi

TL;DR

MLPerf Power addresses the pressing need to quantify energy efficiency across the full spectrum of ML systems, from microwatts to megawatts, using a standardized, reproducible methodology. It extends the MLPerf benchmarking framework with full-system power measurement, cross-scale rules, representative workloads, and transparent reporting, enabling fair comparisons of edge, datacenter, and HPC deployments for both inference and training. The paper presents a large-scale, multi-year study (1,841 measurements) that reveals strong energy-efficiency gains for newer generative workloads, highlights nonlinear energy scaling at scale, and demonstrates the substantial impact of software optimizations and quantization alongside hardware advances. These findings inform design choices for sustainable AI, guide industry collaboration, and provide a foundation for regulatory alignment and future benchmarking innovations.

Abstract

Rapid adoption of machine learning (ML) technologies has led to a surge in power consumption across diverse systems, from tiny IoT devices to massive datacenter clusters. Benchmarking the energy efficiency of these systems is crucial for optimization, but presents novel challenges due to the variety of hardware platforms, workload characteristics, and system-level interactions. This paper introduces MLPerf Power, a comprehensive benchmarking methodology with capabilities to evaluate the energy efficiency of ML systems at power levels ranging from microwatts to megawatts. Developed by a consortium of industry professionals from more than 20 organizations, MLPerf Power establishes rules and best practices to ensure comparability across diverse architectures. We use representative workloads from the MLPerf benchmark suite to collect 1,841 reproducible measurements from 60 systems across the entire range of ML deployment scales. Our analysis reveals trade-offs between performance, complexity, and energy efficiency across this wide range of systems, providing actionable insights for designing optimized ML solutions from the smallest edge devices to the largest cloud infrastructures. This work emphasizes the importance of energy efficiency as a key metric in the evaluation and comparison of the ML system, laying the foundation for future research in this critical area. We discuss the implications for developing sustainable AI solutions and standardizing energy efficiency benchmarking for ML systems.

MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from Microwatts to Megawatts for Sustainable AI

TL;DR

MLPerf Power addresses the pressing need to quantify energy efficiency across the full spectrum of ML systems, from microwatts to megawatts, using a standardized, reproducible methodology. It extends the MLPerf benchmarking framework with full-system power measurement, cross-scale rules, representative workloads, and transparent reporting, enabling fair comparisons of edge, datacenter, and HPC deployments for both inference and training. The paper presents a large-scale, multi-year study (1,841 measurements) that reveals strong energy-efficiency gains for newer generative workloads, highlights nonlinear energy scaling at scale, and demonstrates the substantial impact of software optimizations and quantization alongside hardware advances. These findings inform design choices for sustainable AI, guide industry collaboration, and provide a foundation for regulatory alignment and future benchmarking innovations.

Abstract

Rapid adoption of machine learning (ML) technologies has led to a surge in power consumption across diverse systems, from tiny IoT devices to massive datacenter clusters. Benchmarking the energy efficiency of these systems is crucial for optimization, but presents novel challenges due to the variety of hardware platforms, workload characteristics, and system-level interactions. This paper introduces MLPerf Power, a comprehensive benchmarking methodology with capabilities to evaluate the energy efficiency of ML systems at power levels ranging from microwatts to megawatts. Developed by a consortium of industry professionals from more than 20 organizations, MLPerf Power establishes rules and best practices to ensure comparability across diverse architectures. We use representative workloads from the MLPerf benchmark suite to collect 1,841 reproducible measurements from 60 systems across the entire range of ML deployment scales. Our analysis reveals trade-offs between performance, complexity, and energy efficiency across this wide range of systems, providing actionable insights for designing optimized ML solutions from the smallest edge devices to the largest cloud infrastructures. This work emphasizes the importance of energy efficiency as a key metric in the evaluation and comparison of the ML system, laying the foundation for future research in this critical area. We discuss the implications for developing sustainable AI solutions and standardizing energy efficiency benchmarking for ML systems.

Paper Structure

This paper contains 20 sections, 11 figures, 1 table.

Figures (11)

  • Figure 1: MLPerf performance improvements have outpaced Moore's Law. This trend highlights the rapid evolution of AI systems, prompting the development of MLPerf Power to address emerging concerns over their energy efficiency.
  • Figure 2: The power consumption range across MLPerf divisions, highlighting the need for scalable power measurement.
  • Figure 3: ML system components within the MLPerf Power measurement scope are outlined in green.
  • Figure 4: Measurement diagrams for Tiny, Multi-SUT Inference, and Training systems.
  • Figure 5: Comparison of energy efficiency trends for datacenter, edge, and tiny inference.
  • ...and 6 more figures