Table of Contents
Fetching ...

From Clicks to Carbon: The Environmental Toll of Recommender Systems

Tobias Vente, Lukas Wegmeth, Alan Said, Joeran Beel

TL;DR

This paper reveals a substantial environmental cost in recommender-systems research, showing that modern deep-learning approaches consume significantly more energy and generate far higher CO2 equivalents than traditional methods without consistent performance benefits. By reproducing representative pipelines from 2013 and 2023 ACM RecSys papers, measuring energy with hardware meters, and converting to CO2e across multiple locations and hardware configurations, the authors quantify both per-paper and conference-wide footprints. The key contributions include a detailed comparative analysis of hardware, software libraries, datasets, and open-source code practices, plus a clear demonstration of how geography and hardware choice influence emissions. The findings highlight the need for transparent reporting of experimental pipelines, careful algorithm/dataset selection, and sustainable practices to mitigate the environmental impact of recommender-systems research, while providing concrete baselines and methodology for reproducibility and future optimization.

Abstract

As global warming soars, the need to assess the environmental impact of research is becoming increasingly urgent. Despite this, few recommender systems research papers address their environmental impact. In this study, we estimate the environmental impact of recommender systems research by reproducing typical experimental pipelines. Our analysis spans 79 full papers from the 2013 and 2023 ACM RecSys conferences, comparing traditional "good old-fashioned AI" algorithms with modern deep learning algorithms. We designed and reproduced representative experimental pipelines for both years, measuring energy consumption with a hardware energy meter and converting it to CO2 equivalents. Our results show that papers using deep learning algorithms emit approximately 42 times more CO2 equivalents than papers using traditional methods. On average, a single deep learning-based paper generates 3,297 kilograms of CO2 equivalents - more than the carbon emissions of one person flying from New York City to Melbourne or the amount of CO2 one tree sequesters over 300 years.

From Clicks to Carbon: The Environmental Toll of Recommender Systems

TL;DR

This paper reveals a substantial environmental cost in recommender-systems research, showing that modern deep-learning approaches consume significantly more energy and generate far higher CO2 equivalents than traditional methods without consistent performance benefits. By reproducing representative pipelines from 2013 and 2023 ACM RecSys papers, measuring energy with hardware meters, and converting to CO2e across multiple locations and hardware configurations, the authors quantify both per-paper and conference-wide footprints. The key contributions include a detailed comparative analysis of hardware, software libraries, datasets, and open-source code practices, plus a clear demonstration of how geography and hardware choice influence emissions. The findings highlight the need for transparent reporting of experimental pipelines, careful algorithm/dataset selection, and sustainable practices to mitigate the environmental impact of recommender-systems research, while providing concrete baselines and methodology for reproducibility and future optimization.

Abstract

As global warming soars, the need to assess the environmental impact of research is becoming increasingly urgent. Despite this, few recommender systems research papers address their environmental impact. In this study, we estimate the environmental impact of recommender systems research by reproducing typical experimental pipelines. Our analysis spans 79 full papers from the 2013 and 2023 ACM RecSys conferences, comparing traditional "good old-fashioned AI" algorithms with modern deep learning algorithms. We designed and reproduced representative experimental pipelines for both years, measuring energy consumption with a hardware energy meter and converting it to CO2 equivalents. Our results show that papers using deep learning algorithms emit approximately 42 times more CO2 equivalents than papers using traditional methods. On average, a single deep learning-based paper generates 3,297 kilograms of CO2 equivalents - more than the carbon emissions of one person flying from New York City to Melbourne or the amount of CO2 one tree sequesters over 300 years.
Paper Structure (30 sections, 2 equations, 5 figures, 4 tables)

This paper contains 30 sections, 2 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Average power consumption of recommender system algorithms across twelve datasets, alongside the average consumption per dataset across sixteen algorithms. Blue vertical lines represent the average consumption in kWh. The upper x-axis displays the equivalent CO2 emissions in grams (gCO2e), calculated based on the 2023 world average.
  • Figure 2: Total energy consumption (in kWh) vs. averaged and normalized nDCG@10 performance. Data points in blue represent algorithms running on CPUs and red for running on GPUs. A cross represents a traditional, and a dot is a deep learning algorithm. The nDCG@10 is normalized within each dataset to ensure uniform impact and averaged across all twelve included datasets. The upper x-axis shows the gCO2e emissions, calculated using the 2023 world average.
  • Figure 3: The relationship between energy consumption (in kWh) and runtime, including training, prediction, and evaluation phases (in seconds). Each data point represents an algorithm applied to one of the twelve datasets. The linear functions illustrate the linear regression models for the respective groups of data points. The right-hand y-axis displays the corresponding gCO2e emissions, calculated using the 2023 world average.
  • Figure 4: Average power consumption of traditional algorithms executed on 2013 hardware across seven datasets. The orange vertical line indicates the average energy consumption in kWh for ranking predictions and the orange for rating prediction tasks. The upper x-axis shows the gCO2e emissions, calculated using the 2023 world average. Not every algorithm is suited for rating- and ranking prediction tasks; therefore, not every algorithm displays two boxplots.
  • Figure 5: The regional variations in gCO2e per recommender system algorithm type. The x-axis displays the gCO2e, while the y-axis categorizes by region. Blue bars represent the emissions from a representative deep learning algorithm executed on the modern workstation hardware, and orange bars represent those from a traditional algorithm run on 2013 hardware. The gCO2e are calculated using the respective annual conversion factors, reflecting changes in gCO2 per kWh over the decade.