Table of Contents
Fetching ...

Corporate Greenwashing Detection in Text -- a Survey

Tom Calamai, Oana Balalau, Théo Le Guenedal, Fabian M. Suchanek

TL;DR

The paper surveys NLP methods for detecting greenwashing in climate-related corporate text, arguing that no gold-standard greenwashing dataset currently exists and that researchers rely on intermediary tasks to approximate detection. It clusters work into pretraining domain-specific models, climate-topic detection, and thematic analysis (TCFD/ESG), then expands to in-depth climate-risk classification, green claim detection, stance, Q&A, deceptive techniques, and environmental performance prediction. Across sections, the authors report that transformer-based models dominate with strong but sometimes limited gains, and that many results depend heavily on dataset definitions and labeling quality. They underscore major open challenges: evaluation methodology, model robustness to noise and adversarial inputs, data access and reproducibility, and the need to link texts to regulatory standards to ground judgments. Collectively, the survey maps a multi-layered NLP pipeline for greenwashing detection, highlights the gaps between theory and practice, and calls for real-world, regulator-aligned datasets to enable reliable, scalable detection and accountability in climate communications.

Abstract

Greenwashing is an effort to mislead the public about the environmental impact of an entity, such as a state or company. We provide a comprehensive survey of the scientific literature addressing natural language processing methods to identify potentially misleading climate-related corporate communications, indicative of greenwashing. We break the detection of greenwashing into intermediate tasks, and review the state-of-the-art approaches for each of them. We discuss datasets, methods, and results, as well as limitations and open challenges. We also provide an overview of how far the field has come as a whole, and point out future research directions.

Corporate Greenwashing Detection in Text -- a Survey

TL;DR

The paper surveys NLP methods for detecting greenwashing in climate-related corporate text, arguing that no gold-standard greenwashing dataset currently exists and that researchers rely on intermediary tasks to approximate detection. It clusters work into pretraining domain-specific models, climate-topic detection, and thematic analysis (TCFD/ESG), then expands to in-depth climate-risk classification, green claim detection, stance, Q&A, deceptive techniques, and environmental performance prediction. Across sections, the authors report that transformer-based models dominate with strong but sometimes limited gains, and that many results depend heavily on dataset definitions and labeling quality. They underscore major open challenges: evaluation methodology, model robustness to noise and adversarial inputs, data access and reproducibility, and the need to link texts to regulatory standards to ground judgments. Collectively, the survey maps a multi-layered NLP pipeline for greenwashing detection, highlights the gaps between theory and practice, and calls for real-world, regulator-aligned datasets to enable reliable, scalable detection and accountability in climate communications.

Abstract

Greenwashing is an effort to mislead the public about the environmental impact of an entity, such as a state or company. We provide a comprehensive survey of the scientific literature addressing natural language processing methods to identify potentially misleading climate-related corporate communications, indicative of greenwashing. We break the detection of greenwashing into intermediate tasks, and review the state-of-the-art approaches for each of them. We discuss datasets, methods, and results, as well as limitations and open challenges. We also provide an overview of how far the field has come as a whole, and point out future research directions.

Paper Structure

This paper contains 128 sections, 1 figure, 23 tables.

Figures (1)

  • Figure 1: Upset plot upset_plot of the methods used in the studies mentioned in this literature review. Each bar counts the number of studies reporting a given set of methods. The number of a papers using a given method is reported on the horizontal bars. We have highlighted in red the intersection sets that include both fine-tuned Transformers and keyword-based methods.

Theorems & Definitions (2)

  • Definition 1
  • Definition 2