Table of Contents
Fetching ...

Exploring Nature: Datasets and Models for Analyzing Nature-Related Disclosures

Tobias Schimanski, Chiara Colesanti Senni, Glen Gostlow, Jingwei Ni, Tingyu Yu, Markus Leippold

TL;DR

This work addresses the need to quantify corporate nature disclosures by constructing expert-annotated datasets for water, forest, and biodiversity and training transformer-based classifiers guided by TNFD. It builds a large base corpus, uses multi-stage labeling including GPT-3.5 pre-labeling and human annotation, and evaluates domain-specific models (ClimateBERT, EnvironmentalBERT) against generic baselines with five-fold cross-validation. A case study on 2021 earnings-call transcripts reveals industry and geographic patterns in nature discussions, with agriculture and utilities showing the greatest exposure and equatorial hotspots validating the method. Overall, the paper provides datasets and classifiers to enable scalable, regulator-relevant analyses of nature-related corporate disclosures, offering practical tools for investors, analysts, and policymakers.

Abstract

Nature is an amorphous concept. Yet, it is essential for the planet's well-being to understand how the economy interacts with it. To address the growing demand for information on corporate nature disclosure, we provide datasets and classifiers to detect nature communication by companies. We ground our approach in the guidelines of the Taskforce on Nature-related Financial Disclosures (TNFD). Particularly, we focus on the specific dimensions of water, forest, and biodiversity. For each dimension, we create an expert-annotated dataset with 2,200 text samples and train classifier models. Furthermore, we show that nature communication is more prevalent in hotspot areas and directly effected industries like agriculture and utilities. Our approach is the first to respond to calls to assess corporate nature communication on a large scale.

Exploring Nature: Datasets and Models for Analyzing Nature-Related Disclosures

TL;DR

This work addresses the need to quantify corporate nature disclosures by constructing expert-annotated datasets for water, forest, and biodiversity and training transformer-based classifiers guided by TNFD. It builds a large base corpus, uses multi-stage labeling including GPT-3.5 pre-labeling and human annotation, and evaluates domain-specific models (ClimateBERT, EnvironmentalBERT) against generic baselines with five-fold cross-validation. A case study on 2021 earnings-call transcripts reveals industry and geographic patterns in nature discussions, with agriculture and utilities showing the greatest exposure and equatorial hotspots validating the method. Overall, the paper provides datasets and classifiers to enable scalable, regulator-relevant analyses of nature-related corporate disclosures, offering practical tools for investors, analysts, and policymakers.

Abstract

Nature is an amorphous concept. Yet, it is essential for the planet's well-being to understand how the economy interacts with it. To address the growing demand for information on corporate nature disclosure, we provide datasets and classifiers to detect nature communication by companies. We ground our approach in the guidelines of the Taskforce on Nature-related Financial Disclosures (TNFD). Particularly, we focus on the specific dimensions of water, forest, and biodiversity. For each dimension, we create an expert-annotated dataset with 2,200 text samples and train classifier models. Furthermore, we show that nature communication is more prevalent in hotspot areas and directly effected industries like agriculture and utilities. Our approach is the first to respond to calls to assess corporate nature communication on a large scale.
Paper Structure (16 sections, 5 figures, 9 tables)

This paper contains 16 sections, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Distribution of the labeling data
  • Figure 2: Top 20 industries communicating about nature-related topics measured as a ratio of nature communications vs. all communication
  • Figure 3: Keyword appearance for biodiversity keywords
  • Figure 4: Confusion matrix for classifying biodiversity sentences with the biodiversity keyword approach
  • Figure 5: Proportion of earnings conference calls that mention nature-related topics in at least one sentence in each country