Counting Carbon: A Survey of Factors Influencing the Emissions of Machine Learning

Alexandra Sasha Luccioni; Alex Hernandez-Garcia

Counting Carbon: A Survey of Factors Influencing the Emissions of Machine Learning

Alexandra Sasha Luccioni, Alex Hernandez-Garcia

TL;DR

This survey broadens the analysis of ML carbon emissions by compiling 95 models across NLP and CV tasks to examine energy sources, emission magnitudes, and their evolution over time. It presents a transparent framework for estimating emissions via $C = P \times T \times I = E \times I$ and highlights how carbon intensity and training time largely drive variability, with limited influence from hardware power alone. The study finds emissions have risen markedly over the years, particularly with transformer-based and NAS-driven models, yet no universal link between energy use and task performance emerges. It concludes with a call for standardized reporting and a centralized hub to track lifecycle emissions, as well as broader life-cycle analyses including deployment and manufacturing.

Abstract

Machine learning (ML) requires using energy to carry out computations during the model training process. The generation of this energy comes with an environmental cost in terms of greenhouse gas emissions, depending on quantity used and the energy source. Existing research on the environmental impacts of ML has been limited to analyses covering a small number of models and does not adequately represent the diversity of ML models and tasks. In the current study, we present a survey of the carbon emissions of 95 ML models across time and different tasks in natural language processing and computer vision. We analyze them in terms of the energy sources used, the amount of CO2 emissions produced, how these emissions evolve across time and how they relate to model performance. We conclude with a discussion regarding the carbon footprint of our field and propose the creation of a centralized repository for reporting and tracking these emissions.

Counting Carbon: A Survey of Factors Influencing the Emissions of Machine Learning

TL;DR

and highlights how carbon intensity and training time largely drive variability, with limited influence from hardware power alone. The study finds emissions have risen markedly over the years, particularly with transformer-based and NAS-driven models, yet no universal link between energy use and task performance emerges. It concludes with a call for standardized reporting and a centralized hub to track lifecycle emissions, as well as broader life-cycle analyses including deployment and manufacturing.

Abstract

Paper Structure (31 sections, 1 equation, 6 figures, 3 tables)

This paper contains 31 sections, 1 equation, 6 figures, 3 tables.

Introduction
Related work
Empirical studies on carbon emissions
Tools and approaches for measuring carbon emissions
Broader impacts of ML models
Efficient algorithms and hardware
Other aspects of the carbon impact of ML
Methodology
Data collection
Estimating carbon emissions
Carbon Intensity
Hardware power
Training Time
Data analysis
What are the main sources of energy used for training ML models?
...and 16 more sections

Figures (6)

Figure 1: Map with the countries where the models in the data were trained, as reported by the authors. The colors code the median carbon intensity of the energy used by the models trained in each country. The legend indicates the number of models trained in each country, as well as a colored patch marking the main energy source -- see bottom of the legend for the values.
Figure 2: Estimated energy consumed (kWh) and CO2 (kg) by each model in the data set, plotted in a log-log scale. Colors indicate the principal energy source, and the size of the dot carbon intensity. While the relationship between energy and carbon emissions is mostly linear, the data show that models trained with less carbon-intensive energy (e.g. hydroelectric) emit orders of magnitude less carbon than those trained using more carbon-intensive energy (e.g. coal).
Figure 3: CO2 emitted (in kg) by the all models included in the data set, on a logarithmic scale. Each small marker corresponds to a model and the large markers indicate the 99 % trimmed mean within each task and year(s) of publication. The error lines cover the bootstrapped 99 % confidence intervals. The gray line corresponds to the average over all tasks.
Figure 4: Comparison of the accuracy achieved by each model trained on Machine Translation (top left, evaluated using BLEU score on the English-French and English-German WMT datasets), Image Classification (top right, measured using Top-1 accuracy on ImageNet), Question Answering (bottom left, evaluated using F1 score on SQuAD v.1) and Named Entity Recognition (bottom right, evaluated using F1 score on the CoNLL dataset) and the CO2 emitted for training models. The black curves correspond to the Pareto fronts given the data, that is data points under the line are sub-optimal in terms of performance and CO2 emitted.Note that the x axis is in logarithmic scale.
Figure 5: Comparison of the performance achieved by each model trained on Machine Translation tasks (BLEU score) and Image Classification (top-1 accuracy), and the energy consumed.
...and 1 more figures

Counting Carbon: A Survey of Factors Influencing the Emissions of Machine Learning

TL;DR

Abstract

Counting Carbon: A Survey of Factors Influencing the Emissions of Machine Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)