ClimDetect: A Benchmark Dataset for Climate Change Detection and Attribution

Sungduk Yu; Brian L. White; Anahita Bhiwandiwalla; Musashi Hinck; Matthew Lyle Olson; Yaniv Gurwicz; Raanan Y. Rohekar; Tung Nguyen; Vasudev Lal

ClimDetect: A Benchmark Dataset for Climate Change Detection and Attribution

Sungduk Yu, Brian L. White, Anahita Bhiwandiwalla, Musashi Hinck, Matthew Lyle Olson, Yaniv Gurwicz, Raanan Y. Rohekar, Tung Nguyen, Vasudev Lal

TL;DR

ClimDetect provides a large, ML-ready benchmark for climate change detection and attribution by pairing daily CMIP6 inputs with targets such as $AGMT$ (annual global mean temperature) using variables $tas$, $huss$, and $pr$ on a $64 \times 128$ grid ($X \in \mathbb{R}^{64 \times 128 \times 3}$, $y \in \mathbb{R}$). The framework trains a regression model $y = F_{\theta}(X)$ on CMIP6 data, then conducts hypothesis tests against the natural variability distribution $P(y_{hist})$, with Year of Emergence ($YoE$) and emergence fraction thresholds guiding signal detection. The authors benchmark four Vision Transformers and traditional baselines (ridge, MLP, CNN) on ClimDetect and real-world reanalysis (ERA5, JRA-3Q, MERRA-2), showing ViTs often yield lower RMSE and earlier YoE, thereby improving detection sensitivity. The dataset and accompanying benchmarks are openly accessible via Hugging Face to promote reproducibility, comparability, and accelerated ML-driven climate change research.

Abstract

Detecting and attributing temperature increases driven by climate change is crucial for understanding global warming and informing adaptation strategies. However, distinguishing human-induced climate signals from natural variability remains challenging for traditional detection and attribution (D&A) methods, which rely on identifying specific "fingerprints" -- spatial patterns expected to emerge from external forcings such as greenhouse gas emissions. Deep learning offers promise in discerning these complex patterns within expansive spatial datasets, yet the lack of standardized protocols has hindered consistent comparisons across studies. To address this gap, we introduce ClimDetect, a standardized dataset comprising 1.17M daily climate snapshots paired with target climate change indicator variables. The dataset is curated from both CMIP6 climate model simulations and real-world observation-assimilated reanalysis datasets (ERA5, JRA-3Q, and MERRA-2), and is designed to enhance model accuracy in detecting climate change signals. ClimDetect integrates various input and target variables used in previous research, ensuring comparability and consistency across studies. We also explore the application of vision transformers (ViT) to climate data -- a novel approach that, to our knowledge, has not been attempted before for climate change detection tasks. Our open-access data serve as a benchmark for advancing climate science by enabling end-to-end model development and evaluation. ClimDetect is publicly accessible via Hugging Face dataset repository at: https://huggingface.co/datasets/ClimDetect/ClimDetect.

ClimDetect: A Benchmark Dataset for Climate Change Detection and Attribution

TL;DR

ClimDetect provides a large, ML-ready benchmark for climate change detection and attribution by pairing daily CMIP6 inputs with targets such as

(annual global mean temperature) using variables

, and

on a

grid (

). The framework trains a regression model

on CMIP6 data, then conducts hypothesis tests against the natural variability distribution

, with Year of Emergence (

) and emergence fraction thresholds guiding signal detection. The authors benchmark four Vision Transformers and traditional baselines (ridge, MLP, CNN) on ClimDetect and real-world reanalysis (ERA5, JRA-3Q, MERRA-2), showing ViTs often yield lower RMSE and earlier YoE, thereby improving detection sensitivity. The dataset and accompanying benchmarks are openly accessible via Hugging Face to promote reproducibility, comparability, and accelerated ML-driven climate change research.

Abstract

Paper Structure (30 sections, 13 figures, 9 tables)

This paper contains 30 sections, 13 figures, 9 tables.

Introduction
Related Work
Climate Detection and Attribution Studies
Climate Datasets for ML
ClimDetect Dataset
Variables
Data Source
Data Collection
Postprocessing
Dataset Split
Dataset Access
Framework for Climate Change Detection
Benchmark
Experiments
Baseline Models and Training Details
...and 15 more sections

Figures (13)

Figure 1: Overview of the machine learning pipeline for climate change detection and attribution using the ClimDetect dataset. The diagram illustrates the workflow from input daily climate model variables (surface air temperature, humidity, precipitation), through a neural network model, to the target annual global mean temperature (AGMT). The diagram features climate field maps distinguished by color to denote independent datasets: the training dataset in orange, the historical (i.e., pre-warming) dataset in green, and the observation dataset in purple. $F_\theta$ denotes a detection model (e.g., vision transformer, CNN, etc.), where $\theta$ represents the parameters of the model. One purple dot represent an individual estimates from a single observation sample. For detailed information, see Section \ref{['sec:detection_method']}
Figure 2: Detection model: ViT-b/16; Experiment: "tas_mr". (Left) Model-predicted test statistic, AGMT, from three different reanalysis datasets, displayed as 365 black dots per year with their mean represented by the colored line. The red lines indicate the 2.5th to 97.5th percentile range of natural variability for the test statistic, which was estimated from the 1850-1949 CMIP6 model simulation output (the test split). (Right) Emergence fraction (EF) per year, defined as the fraction of days where predicted AGMT exceeds the upper bound (the 97.5th percentile of natural variability) within one year. Centered 5-year window moving averaging is applied to EF time series. (Bottom Right) The black line represents the average of the three colored lines shown in the upper panels. The Year of Emergence (YoE) is calculated from this average, defined as the first year where the averaged EF surpasses the 97.5% threshold (blue line), corresponding to 356 days.
Figure 3: Year of emergence (YoE), defined as the first year when at least 97.5% of daily climate fields show a distinguishable climate change signal from natural variability. Grey bars indicate instances where a model failed to capture YoE within the reanalysis period of 1980-2024. "pr" is omitted since no detection model can capture YoE.
Figure 4: Visualization of Integrated Gradients (IG) times Input for the "tas-huss-pr_mr" experiment, highlighting regions influencing the prediction of AGMT. Appendix \ref{['App:IG']} includes IG$\times$Input visualizations for other experiments.
Figure 5: Probability density of AGMT in the training split across three different time periods: (blue) 1850–2100, (orange) 1900–2100 (orange), and (green) 1950–2100.
...and 8 more figures

ClimDetect: A Benchmark Dataset for Climate Change Detection and Attribution

TL;DR

Abstract

ClimDetect: A Benchmark Dataset for Climate Change Detection and Attribution

Authors

TL;DR

Abstract

Table of Contents

Figures (13)