Table of Contents
Fetching ...

SpiderGen: Towards Procedure Generation For Carbon Life Cycle Assessments with Generative AI

Anupama Sitaraman, Bharathan Balaji, Yuvraj Agarwal

TL;DR

SpiderGen tackles the automation of Life Cycle Assessment (LCA) procedure generation by generating Product Category Rule Process Flow Graphs (PCR PFGs) for product categories using zero-shot LLM reasoning, SBERT embeddings, and clustering to create a DAG of upstream, core, and downstream processes. It formalizes G_{pc} as a DAG constrained by lifecycle phases and employs a four-step pipeline to produce generalizable, phase-ordered process graphs. Evaluated against 65 ground-truth PCRs from EPD International, SpiderGen achieves an average F1-score of $65\%$, outperforming one-shot baselines, and demonstrates favorable cost and time benefits (sub-$1 per PFG in under 10 minutes vs >$25{,}000 and 21 person-days). The work further analyzes model choices, sample-product effects, and complexity-driven performance, highlighting significant practical potential for rapid, scalable, and transparent LCA support, while outlining open challenges in boundary definition and real-world deployment.

Abstract

Investigating the effects of climate change and global warming caused by GHG emissions have been a key concern worldwide. These emissions are largely contributed to by the production, use and disposal of consumer products. Thus, it is important to build tools to estimate the environmental impact of consumer goods, an essential part of which is conducting Life Cycle Assessments (LCAs). LCAs specify and account for the appropriate processes involved with the production, use, and disposal of the products. We present SpiderGen, an LLM-based workflow which integrates the taxonomy and methodology of traditional LCA with the reasoning capabilities and world knowledge of LLMs to generate graphical representations of the key procedural information used for LCA, known as Product Category Rules Process Flow Graphs (PCR PFGs). We additionally evaluate the output of SpiderGen by comparing it with 65 real-world LCA documents. We find that SpiderGen provides accurate LCA process information that is either fully correct or has minor errors, achieving an F1-Score of 65% across 10 sample data points, as compared to 53% using a one-shot prompting method. We observe that the remaining errors occur primarily due to differences in detail between LCA documents, as well as differences in the "scope" of which auxiliary processes must also be included. We also demonstrate that SpiderGen performs better than several baselines techniques, such as chain-of-thought prompting and one-shot prompting. Finally, we highlight SpiderGen's potential to reduce the human effort and costs for estimating carbon impact, as it is able to produce LCA process information for less than \$1 USD in under 10 minutes as compared to the status quo LCA, which can cost over \$25000 USD and take up to 21-person days.

SpiderGen: Towards Procedure Generation For Carbon Life Cycle Assessments with Generative AI

TL;DR

SpiderGen tackles the automation of Life Cycle Assessment (LCA) procedure generation by generating Product Category Rule Process Flow Graphs (PCR PFGs) for product categories using zero-shot LLM reasoning, SBERT embeddings, and clustering to create a DAG of upstream, core, and downstream processes. It formalizes G_{pc} as a DAG constrained by lifecycle phases and employs a four-step pipeline to produce generalizable, phase-ordered process graphs. Evaluated against 65 ground-truth PCRs from EPD International, SpiderGen achieves an average F1-score of , outperforming one-shot baselines, and demonstrates favorable cost and time benefits (sub-25{,}000 and 21 person-days). The work further analyzes model choices, sample-product effects, and complexity-driven performance, highlighting significant practical potential for rapid, scalable, and transparent LCA support, while outlining open challenges in boundary definition and real-world deployment.

Abstract

Investigating the effects of climate change and global warming caused by GHG emissions have been a key concern worldwide. These emissions are largely contributed to by the production, use and disposal of consumer products. Thus, it is important to build tools to estimate the environmental impact of consumer goods, an essential part of which is conducting Life Cycle Assessments (LCAs). LCAs specify and account for the appropriate processes involved with the production, use, and disposal of the products. We present SpiderGen, an LLM-based workflow which integrates the taxonomy and methodology of traditional LCA with the reasoning capabilities and world knowledge of LLMs to generate graphical representations of the key procedural information used for LCA, known as Product Category Rules Process Flow Graphs (PCR PFGs). We additionally evaluate the output of SpiderGen by comparing it with 65 real-world LCA documents. We find that SpiderGen provides accurate LCA process information that is either fully correct or has minor errors, achieving an F1-Score of 65% across 10 sample data points, as compared to 53% using a one-shot prompting method. We observe that the remaining errors occur primarily due to differences in detail between LCA documents, as well as differences in the "scope" of which auxiliary processes must also be included. We also demonstrate that SpiderGen performs better than several baselines techniques, such as chain-of-thought prompting and one-shot prompting. Finally, we highlight SpiderGen's potential to reduce the human effort and costs for estimating carbon impact, as it is able to produce LCA process information for less than \25000 USD and take up to 21-person days.

Paper Structure

This paper contains 33 sections, 6 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: A simplified example PFG for the product category "Wine" produced by SpiderGen.
  • Figure 2: The SpiderGen workflow utilizes LLMs and sentence transformers to create Product Category Rule Process Flow Graphs (PCR PFGs). This workflow consists of (1) A product generation step, where sample products are sourced and (2) processes for those sample products are generated, (3) A coarse process generation step, where the sample product processes are coarsened to create generalized processes and (4) the generation of the process flow graph itself. The formation of these graphs can then be used for a variety of applications, such as generating carbon footprints and and conducting supply chain analysis.
  • Figure 3: Comparing the normalized PMI values as different numbers of products in each category are generated in the first step of the SpiderGen workflow. We observe lower variability in PMI as the number increases.
  • Figure 4: Comparing the Normalized PMI Scores for 65 products with varying complexity as denoted by the number of nodes in the ground truth $G_{pc}$ graphs. SpiderGen has higher normalized PMI for simpler product categories (e.g. "Dairy Products") and lower for more complicated ones (e.g. "Railways"). We also report the qualitative scores for 10 product categories (circled nodes) in Table \ref{['fig:ten_sample_table']}, and label some of the product categories in Table \ref{['fig:ten_sample_table']} in this figure.
  • Figure 5: An example comparison of PFGs between (a) SpiderGen, and two baselines, (b) LLMDirect and (c) LLMExample for the product category "Asphalt Mixtures". We note that SpiderGen captures all downstream processes, and captures the majority of core processes, as compared to both LLMDirect and LLMExample, which fail to capture large portions of important processes, such as processing Asphalt additives, and maintaining asphalt.
  • ...and 4 more figures