Table of Contents
Fetching ...

Towards Carbon Footprint-Aware Recommender Systems for Greener Item Recommendation

Raoul Kalisvaart, Masoud Mansoury, Alan Hanjalic, Elvin Isufi

TL;DR

This work tackles carbon footprint-aware recommender systems by introducing RecipeEmission, the first RecSys dataset that includes item-level CO$_2$-eq footprints and greenness scores. It benchmarks nine conventional RecSys algorithms and shows that accuracy-optimized models do not prioritize greener items, while longer lists tend to be greener but less accurate. A simple, modular reranking approach that combines predicted relevance with item greenness ($\mu_{u,i} = \alpha \hat{r}_{u,i} + (1-\alpha) g_i$) yields substantial greenness gains with only small accuracy losses, enabling greener recommendations without retraining. The dataset, metrics, and reranking method lay a foundation for multi-criteria, sustainability-focused RecSys research with practical implications for greener item recommendations in e-commerce and beyond.

Abstract

The commodity and widespread use of online shopping are having an unprecedented impact on climate, with emission figures from key actors that are easily comparable to those of a large-scale metropolis. Despite online shopping being fueled by recommender systems (RecSys) algorithms, the role and potential of the latter in promoting more sustainable choices is little studied. One of the main reasons for this could be attributed to the lack of a dataset containing carbon footprint emissions for the items. While building such a dataset is a rather challenging task, its presence is pivotal for opening the doors to novel perspectives, evaluations, and methods for RecSys research. In this paper, we target this bottleneck and study the environmental role of RecSys algorithms. First, we mine a dataset that includes carbon footprint emissions for its items. Then, we benchmark conventional RecSys algorithms in terms of accuracy and sustainability as two faces of the same coin. We find that RecSys algorithms optimized for accuracy overlook greenness and that longer recommendation lists are greener but less accurate. Then, we show that a simple reranking approach that accounts for the item's carbon footprint can establish a better trade-off between accuracy and greenness. This reranking approach is modular, ready to use, and can be applied to any RecSys algorithm without the need to alter the underlying mechanisms or retrain models. Our results show that a small sacrifice of accuracy can lead to significant improvements of recommendation greenness across all algorithms and list lengths. Arguably, this accuracy-greenness trade-off could even be seen as an enhancement of user satisfaction, particularly for purpose-driven users who prioritize the environmental impact of their choices. We anticipate this work will serve as the starting point for studying RecSys for more sustainable recommendations.

Towards Carbon Footprint-Aware Recommender Systems for Greener Item Recommendation

TL;DR

This work tackles carbon footprint-aware recommender systems by introducing RecipeEmission, the first RecSys dataset that includes item-level CO-eq footprints and greenness scores. It benchmarks nine conventional RecSys algorithms and shows that accuracy-optimized models do not prioritize greener items, while longer lists tend to be greener but less accurate. A simple, modular reranking approach that combines predicted relevance with item greenness () yields substantial greenness gains with only small accuracy losses, enabling greener recommendations without retraining. The dataset, metrics, and reranking method lay a foundation for multi-criteria, sustainability-focused RecSys research with practical implications for greener item recommendations in e-commerce and beyond.

Abstract

The commodity and widespread use of online shopping are having an unprecedented impact on climate, with emission figures from key actors that are easily comparable to those of a large-scale metropolis. Despite online shopping being fueled by recommender systems (RecSys) algorithms, the role and potential of the latter in promoting more sustainable choices is little studied. One of the main reasons for this could be attributed to the lack of a dataset containing carbon footprint emissions for the items. While building such a dataset is a rather challenging task, its presence is pivotal for opening the doors to novel perspectives, evaluations, and methods for RecSys research. In this paper, we target this bottleneck and study the environmental role of RecSys algorithms. First, we mine a dataset that includes carbon footprint emissions for its items. Then, we benchmark conventional RecSys algorithms in terms of accuracy and sustainability as two faces of the same coin. We find that RecSys algorithms optimized for accuracy overlook greenness and that longer recommendation lists are greener but less accurate. Then, we show that a simple reranking approach that accounts for the item's carbon footprint can establish a better trade-off between accuracy and greenness. This reranking approach is modular, ready to use, and can be applied to any RecSys algorithm without the need to alter the underlying mechanisms or retrain models. Our results show that a small sacrifice of accuracy can lead to significant improvements of recommendation greenness across all algorithms and list lengths. Arguably, this accuracy-greenness trade-off could even be seen as an enhancement of user satisfaction, particularly for purpose-driven users who prioritize the environmental impact of their choices. We anticipate this work will serve as the starting point for studying RecSys for more sustainable recommendations.

Paper Structure

This paper contains 17 sections, 4 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: An example of an entry in the RecipeEmission dataset.
  • Figure 2: Distribution of the statistics for the RecipeEmission and conventional datasets. (a) Long-tail distributions of the user engagement (total ratings of each user). RecipeEmission has a similar distribution as the starting Food.com dataset and baselines. There are no users with a large number of ratings because of the filtration steps over the minimum number of interactions per item. (b) Long-tail distributions of the item popularity distribution (total ratings of each item). RecipeEmission closely follows the distribution of the original Food.com. MovieLens datasets are filtered on the minimum number of interactions per user which can be noted by the truncated distributions in this panel. (c) Rating distribution scaled to the set $\{1, \ldots, 5\}$. All datasets but Book-Crossing have a skew towards high rating values. RecipeEmission respects the distribution of the original Food.com dataset, which follows a similar pattern as the MovieLens datasets.
  • Figure 3: Emission distributions of RecipeEmission dataset. (a) CO$_2$ values of users and items. The high CO$_2$-eq outliers are again the result of recipes with large quantities of high-CO$_2$ products. (b) greenness distributions of RecipeEmissions dataset (c) Distribution of emissions per rating. The large number of outliers is caused by the long-tail distribution of CO$_2$ values of recipes. (d) Greenness distributions over the rating values. There is no apparent distribution shift to the rating, indicating that users did not account for the item greenness when rating them.
  • Figure 4: Overview of the steps taken to build the dataset. First of all, pre-filtering is performed, followed by ingredient and CO$_2$-eq quantification. Finally, the CO$_2$-eq values are transformed to a greenness scale.
  • Figure 5: Rating and greenness distributions of the training, validation and testing splits and the full datasets. Sparsities are 99.9%, 99.9%, 99.9% and 99.9% respectively.
  • ...and 2 more figures