Table of Contents
Fetching ...

Chitchat with AI: Understand the supply chain carbon disclosure of companies worldwide through Large Language Model

Haotian Hang, Yueyang Shen, Vicky Zhu, Jose Cruz, Michelle Li

TL;DR

The paper addresses the challenge of turning heterogeneous CDP climate disclosures into decision-useful signals by introducing a rubric-guided LLM scoring framework. It generates year-specific rubrics and a time-agnostic master rubric to achieve cross-year comparability over 2010–2020, with percentile normalization and rank validation to ensure reliable benchmarking. Applying this framework reveals sectoral and national patterns, policy-driven shifts, and cross-domain correlations, enabling scalable, interpretable insights for investors, regulators, and corporate ESG leaders. The approach advances AI-enabled decision support in climate governance by turning unstructured narratives into structured, actionable intelligence and providing a dataset and scoring templates for integration into ESG dashboards and supply chain assessments.

Abstract

In the context of global sustainability mandates, corporate carbon disclosure has emerged as a critical mechanism for aligning business strategy with environmental responsibility. The Carbon Disclosure Project (CDP) hosts the world's largest longitudinal dataset of climate-related survey responses, combining structured indicators with open-ended narratives, but the heterogeneity and free-form nature of these disclosures present significant analytical challenges for benchmarking, compliance monitoring, and investment screening. This paper proposes a novel decision-support framework that leverages large language models (LLMs) to assess corporate climate disclosure quality at scale. It develops a master rubric that harmonizes narrative scoring across 11 years of CDP data (2010-2020), enabling cross-sector and cross-country benchmarking. By integrating rubric-guided scoring with percentile-based normalization, our method identifies temporal trends, strategic alignment patterns, and inconsistencies in disclosure across industries and regions. Results reveal that sectors such as technology and countries like Germany consistently demonstrate higher rubric alignment, while others exhibit volatility or superficial engagement, offering insights that inform key decision-making processes for investors, regulators, and corporate environmental, social, and governance (ESG) strategists. The proposed LLM-based approach transforms unstructured disclosures into quantifiable, interpretable, comparable, and actionable intelligence, advancing the capabilities of AI-enabled decision support systems (DSSs) in the domain of climate governance.

Chitchat with AI: Understand the supply chain carbon disclosure of companies worldwide through Large Language Model

TL;DR

The paper addresses the challenge of turning heterogeneous CDP climate disclosures into decision-useful signals by introducing a rubric-guided LLM scoring framework. It generates year-specific rubrics and a time-agnostic master rubric to achieve cross-year comparability over 2010–2020, with percentile normalization and rank validation to ensure reliable benchmarking. Applying this framework reveals sectoral and national patterns, policy-driven shifts, and cross-domain correlations, enabling scalable, interpretable insights for investors, regulators, and corporate ESG leaders. The approach advances AI-enabled decision support in climate governance by turning unstructured narratives into structured, actionable intelligence and providing a dataset and scoring templates for integration into ESG dashboards and supply chain assessments.

Abstract

In the context of global sustainability mandates, corporate carbon disclosure has emerged as a critical mechanism for aligning business strategy with environmental responsibility. The Carbon Disclosure Project (CDP) hosts the world's largest longitudinal dataset of climate-related survey responses, combining structured indicators with open-ended narratives, but the heterogeneity and free-form nature of these disclosures present significant analytical challenges for benchmarking, compliance monitoring, and investment screening. This paper proposes a novel decision-support framework that leverages large language models (LLMs) to assess corporate climate disclosure quality at scale. It develops a master rubric that harmonizes narrative scoring across 11 years of CDP data (2010-2020), enabling cross-sector and cross-country benchmarking. By integrating rubric-guided scoring with percentile-based normalization, our method identifies temporal trends, strategic alignment patterns, and inconsistencies in disclosure across industries and regions. Results reveal that sectors such as technology and countries like Germany consistently demonstrate higher rubric alignment, while others exhibit volatility or superficial engagement, offering insights that inform key decision-making processes for investors, regulators, and corporate environmental, social, and governance (ESG) strategists. The proposed LLM-based approach transforms unstructured disclosures into quantifiable, interpretable, comparable, and actionable intelligence, advancing the capabilities of AI-enabled decision support systems (DSSs) in the domain of climate governance.

Paper Structure

This paper contains 26 sections, 1 equation, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Summary of supply chain firms participated in Carbon Disclosure Project (CDP) from year 2010 to year 2020.A. Number of firms within each country participated in CDP B. Gross Domestic Product (GDP) divided by the number of companies within each country. C. Number of companies in different industry sectors.
  • Figure 2: Word frequency of table changes over time. The two most common words/characters in the business strategy sector of the CDP survey of companies within certain countries in 2010 (left) and 2020 (right). The height of bar shows the relative counts between the first and second most common word. Note that Russia, South Africa, and Australia do not have data from 2010 and only 2020 data is shown.
  • Figure 3: Comparison of different scoring methods. Bloomberg A. Different large language models (LLM) produce different scoring for the same prompt, but the trend is similar. B. Prompt engineering: hiding explicit years in the questionnaire and/or shuffling the data also gives the same trend. C. When inputting the questionnaire of each year separately into the model, it gives completely different results, and does not show a temporal trend. D. Quantification of the correlation between different methods.
  • Figure 4: Our main proposed workflow
  • Figure 5: Grading using yearly rubrics and master rubric of CDP data.A. Distribution of scores for different companies in the year 2010 using the corresponding yearly rubric. B., C. Score percentile of example companies over time (B.) using yearly rubrics and (C.) using the master rubric. D., E. Master rubric score of all the companies during 2010-2020 summarized by (D.) count and (E.) probability density function. F. Master rubric score of example companies over time. Example companies in (B., C., F.): blue: Tessy plastic, orange: Aptargroup, green: Porton, red: DANFOSS, purple: DOMINGUES PAES EMPRESA DE Segurança, brown: ABM INDUSTRIES Inc., pink: BLOOMBERG LP.
  • ...and 6 more figures