Counting Carbon: A Survey of Factors Influencing the Emissions of Machine Learning
Alexandra Sasha Luccioni, Alex Hernandez-Garcia
TL;DR
This survey broadens the analysis of ML carbon emissions by compiling 95 models across NLP and CV tasks to examine energy sources, emission magnitudes, and their evolution over time. It presents a transparent framework for estimating emissions via $C = P \times T \times I = E \times I$ and highlights how carbon intensity and training time largely drive variability, with limited influence from hardware power alone. The study finds emissions have risen markedly over the years, particularly with transformer-based and NAS-driven models, yet no universal link between energy use and task performance emerges. It concludes with a call for standardized reporting and a centralized hub to track lifecycle emissions, as well as broader life-cycle analyses including deployment and manufacturing.
Abstract
Machine learning (ML) requires using energy to carry out computations during the model training process. The generation of this energy comes with an environmental cost in terms of greenhouse gas emissions, depending on quantity used and the energy source. Existing research on the environmental impacts of ML has been limited to analyses covering a small number of models and does not adequately represent the diversity of ML models and tasks. In the current study, we present a survey of the carbon emissions of 95 ML models across time and different tasks in natural language processing and computer vision. We analyze them in terms of the energy sources used, the amount of CO2 emissions produced, how these emissions evolve across time and how they relate to model performance. We conclude with a discussion regarding the carbon footprint of our field and propose the creation of a centralized repository for reporting and tracking these emissions.
