Energy Scaling Laws for Diffusion Models: Quantifying Compute and Carbon Emissions in Image Generation
Aniketh Iyengar, Jiaqi Han, Boris Ruf, Vincent Grari, Marcin Detyniecki, Stefano Ermon
TL;DR
Energy consumption predictions for diffusion-model inference are advanced through a Kaplan-style scaling framework that links energy $E$ to compute FLOPs with hardware modifiers. By decomposing inference into text encoding, iterative denoising, and decoding, and showing that denoising dominates compute, the authors validate a near-linear scaling of energy with FLOPs across four architectures and three GPUs. The approach achieves $R^2>0.9$ and strong cross-model/generalization performance, supporting pre-deployment energy budgeting, carbon-aware optimizations, and standardized energy reporting. This framework provides a practical tool for sustainable AI deployment, enabling energy-conscious decisions on precision, step count, resolution, and hardware selection.
Abstract
The rapidly growing computational demands of diffusion models for image generation have raised significant concerns about energy consumption and environmental impact. While existing approaches to energy optimization focus on architectural improvements or hardware acceleration, there is a lack of principled methods to predict energy consumption across different model configurations and hardware setups. We propose an adaptation of Kaplan scaling laws to predict GPU energy consumption for diffusion models based on computational complexity (FLOPs). Our approach decomposes diffusion model inference into text encoding, iterative denoising, and decoding components, with the hypothesis that denoising operations dominate energy consumption due to their repeated execution across multiple inference steps. We conduct comprehensive experiments across four state-of-the-art diffusion models (Stable Diffusion 2, Stable Diffusion 3.5, Flux, and Qwen) on three GPU architectures (NVIDIA A100, A4000, A6000), spanning various inference configurations including resolution (256x256 to 1024x1024), precision (fp16/fp32), step counts (10-50), and classifier-free guidance settings. Our energy scaling law achieves high predictive accuracy within individual architectures (R-squared > 0.9) and exhibits strong cross-architecture generalization, maintaining high rank correlations across models and enabling reliable energy estimation for unseen model-hardware combinations. These results validate the compute-bound nature of diffusion inference and provide a foundation for sustainable AI deployment planning and carbon footprint estimation.
