Table of Contents
Fetching ...

Eco-Amazon: Enriching E-commerce Datasets with Product Carbon Footprint for Sustainable Recommendations

Giuseppe Spillo, Allegra De Filippo, Cataldo Musto, Michela Milano, Giovanni Semeraro

TL;DR

Eco-Amazon tackles the lack of item-level PCF metadata in IR/RS benchmarks by introducing PCF-enriched Amazon datasets across Electronics, Home and Kitchen, and Clothing. It deploys a zero-shot prompting framework aligned to $GHG Protocol$ and ISO 14040/14044 to estimate $CO_2e$ from public product metadata, and releases both the PCF estimates and an open-source estimation script, plus a ground-truth subset from Environmental Product Declarations for benchmarking. A use-case demonstrates PCF-aware post-hoc re-ranking with the sustainability score, using $SaS(u,i) = (1-\alpha) \ pred(u,i) + \alpha \ PCF_{LLM}(i)$ to trade off accuracy and environmental impact. The results show that LLM-based PCF estimation preserves ranking quality and enables multi-objective optimization to reduce carbon footprint with limited accuracy loss, thereby advancing sustainable information retrieval and recommender-system research.

Abstract

In the era of responsible and sustainable AI, information retrieval and recommender systems must expand their scope beyond traditional accuracy metrics to incorporate environmental sustainability. However, this research line is severely limited by the lack of item-level environmental impact data in standard benchmarks. This paper introduces Eco-Amazon, a novel resource designed to bridge this gap. Our resource consists of an enriched version of three widely used Amazon datasets (i.e., Home, Clothing, and Electronics) augmented with Product Carbon Footprint (PCF) metadata. CO2e emission scores were generated using a zero-shot framework that leverages Large Language Models (LLMs) to estimate item-level PCF based on product attributes. Our contribution is three-fold: (i) the release of the Eco-Amazon datasets, enriching item metadata with PCF signals; (ii) the LLM-based PCF estimation script, which allows researchers to enrich any product catalogue and reproduce our results; (iii) a use case demonstrating how PCF estimates can be exploited to promote more sustainable products. By providing these environmental signals, Eco-Amazon enables the community to develop, benchmark, and evaluate the next generation of sustainable retrieval and recommendation models. Our resource is available at https://doi.org/10.5281/zenodo.18549130, while our source code is available at: http://github.com/giuspillo/EcoAmazon/.

Eco-Amazon: Enriching E-commerce Datasets with Product Carbon Footprint for Sustainable Recommendations

TL;DR

Eco-Amazon tackles the lack of item-level PCF metadata in IR/RS benchmarks by introducing PCF-enriched Amazon datasets across Electronics, Home and Kitchen, and Clothing. It deploys a zero-shot prompting framework aligned to and ISO 14040/14044 to estimate from public product metadata, and releases both the PCF estimates and an open-source estimation script, plus a ground-truth subset from Environmental Product Declarations for benchmarking. A use-case demonstrates PCF-aware post-hoc re-ranking with the sustainability score, using to trade off accuracy and environmental impact. The results show that LLM-based PCF estimation preserves ranking quality and enables multi-objective optimization to reduce carbon footprint with limited accuracy loss, thereby advancing sustainable information retrieval and recommender-system research.

Abstract

In the era of responsible and sustainable AI, information retrieval and recommender systems must expand their scope beyond traditional accuracy metrics to incorporate environmental sustainability. However, this research line is severely limited by the lack of item-level environmental impact data in standard benchmarks. This paper introduces Eco-Amazon, a novel resource designed to bridge this gap. Our resource consists of an enriched version of three widely used Amazon datasets (i.e., Home, Clothing, and Electronics) augmented with Product Carbon Footprint (PCF) metadata. CO2e emission scores were generated using a zero-shot framework that leverages Large Language Models (LLMs) to estimate item-level PCF based on product attributes. Our contribution is three-fold: (i) the release of the Eco-Amazon datasets, enriching item metadata with PCF signals; (ii) the LLM-based PCF estimation script, which allows researchers to enrich any product catalogue and reproduce our results; (iii) a use case demonstrating how PCF estimates can be exploited to promote more sustainable products. By providing these environmental signals, Eco-Amazon enables the community to develop, benchmark, and evaluate the next generation of sustainable retrieval and recommendation models. Our resource is available at https://doi.org/10.5281/zenodo.18549130, while our source code is available at: http://github.com/giuspillo/EcoAmazon/.
Paper Structure (11 sections, 1 equation, 2 figures, 3 tables)

This paper contains 11 sections, 1 equation, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Analysis of MAE per PCF class
  • Figure 2: Comparison of Recommendation Models (BPR vs LightGCN) across datasets and metrics, with different $\alpha$ values. $\alpha$ = $0.75$ is the Green focused configuration, $\alpha$ = $0.5$ is the Balanced configuration, $\alpha$ = $0.25$ is the Accuracy focused configuration, and $\alpha$ = $0$ is the baseline configuration, before the post-processing.