Table of Contents
Fetching ...

ClimateCheck 2026: Scientific Fact-Checking and Disinformation Narrative Classification of Climate-related Claims

Raia Abu Ahmad, Max Upravitelev, Aida Usmanova, Veronika Solopova, Georg Rehm

Abstract

Automatically verifying climate-related claims against scientific literature is a challenging task, complicated by the specialised nature of scholarly evidence and the diversity of rhetorical strategies underlying climate disinformation. ClimateCheck 2026 is the second iteration of a shared task addressing this challenge, expanding on the 2025 edition with tripled training data and a new disinformation narrative classification task. Running from January to February 2026 on the CodaBench platform, the competition attracted 20 registered participants and 8 leaderboard submissions, with systems combining dense retrieval pipelines, cross-encoder ensembles, and large language models with structured hierarchical reasoning. In addition to standard evaluation metrics (Recall@K and Binary Preference), we adapt an automated framework to assess retrieval quality under incomplete annotations, exposing systematic biases in how conventional metrics rank systems. A cross-task analysis further reveals that not all climate disinformation is equally verifiable, potentially implicating how future fact-checking systems should be designed.

ClimateCheck 2026: Scientific Fact-Checking and Disinformation Narrative Classification of Climate-related Claims

Abstract

Automatically verifying climate-related claims against scientific literature is a challenging task, complicated by the specialised nature of scholarly evidence and the diversity of rhetorical strategies underlying climate disinformation. ClimateCheck 2026 is the second iteration of a shared task addressing this challenge, expanding on the 2025 edition with tripled training data and a new disinformation narrative classification task. Running from January to February 2026 on the CodaBench platform, the competition attracted 20 registered participants and 8 leaderboard submissions, with systems combining dense retrieval pipelines, cross-encoder ensembles, and large language models with structured hierarchical reasoning. In addition to standard evaluation metrics (Recall@K and Binary Preference), we adapt an automated framework to assess retrieval quality under incomplete annotations, exposing systematic biases in how conventional metrics rank systems. A cross-task analysis further reveals that not all climate disinformation is equally verifiable, potentially implicating how future fact-checking systems should be designed.

Paper Structure

This paper contains 46 sections, 11 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Instance from ClimateCheck 2026. Task 1: Given a claim, systems must retrieve relevant abstracts and use them for verification. Task 2: Given a claim, systems must predict all disinformation narratives associated with it.
  • Figure 2: Per-claim average Recall@5 across all submitted systems, sorted in ascending order. Colours indicate retrieval difficulty.
  • Figure 3: Confusion matrices for the baseline, ClimateSense, and DFKI-IML predictions on task 1.2, normalised by claim. SUP = Supports, REF = Refutes, NEI = Not Enough Information.
  • Figure 4: Per-label F1 scores across participating systems and IAA (Krippendorff's $\alpha$, leftmost column). Labels sorted by mean system F1 (ascending); only labels with more than 3 instances in the test set are shown.
  • Figure 5: Refutation accuracy by narrative group based on the CARDS taxonomy for the baseline and ClimateSense systems.
  • ...and 3 more figures