Table of Contents
Fetching ...

CarbonBench: A Global Benchmark for Upscaling of Carbon Fluxes Using Zero-Shot Learning

Aleksei Rozanov, Arvind Renganathan, Yimeng Zhang, Vipin Kumar

TL;DR

CarbonBench is introduced, the first benchmark for zero-shot spatial transfer in carbon flux upscaling and aims to enable systematic comparison of transfer learning methods, serves as a testbed for regression under distribution shift, and contributes to the next-generation climate modeling efforts.

Abstract

Accurately quantifying terrestrial carbon exchange is essential for climate policy and carbon accounting, yet models must generalize to ecosystems underrepresented in sparse eddy covariance observations. Despite this challenge being a natural instance of zero-shot spatial transfer learning for time series regression, no standardized benchmark exists to rigorously evaluate model performance across geographically distinct locations with different climate regimes and vegetation types. We introduce CarbonBench, the first benchmark for zero-shot spatial transfer in carbon flux upscaling. CarbonBench comprises over 1.3 million daily observations from 567 flux tower sites globally (2000-2024). It provides: (1) stratified evaluation protocols that explicitly test generalization across unseen vegetation types and climate regimes, separating spatial transfer from temporal autocorrelation; (2) a harmonized set of remote sensing and meteorological features to enable flexible architecture design; and (3) baselines ranging from tree-based methods to domain-generalization architectures. By bridging machine learning methodologies and Earth system science, CarbonBench aims to enable systematic comparison of transfer learning methods, serves as a testbed for regression under distribution shift, and contributes to the next-generation climate modeling efforts.

CarbonBench: A Global Benchmark for Upscaling of Carbon Fluxes Using Zero-Shot Learning

TL;DR

CarbonBench is introduced, the first benchmark for zero-shot spatial transfer in carbon flux upscaling and aims to enable systematic comparison of transfer learning methods, serves as a testbed for regression under distribution shift, and contributes to the next-generation climate modeling efforts.

Abstract

Accurately quantifying terrestrial carbon exchange is essential for climate policy and carbon accounting, yet models must generalize to ecosystems underrepresented in sparse eddy covariance observations. Despite this challenge being a natural instance of zero-shot spatial transfer learning for time series regression, no standardized benchmark exists to rigorously evaluate model performance across geographically distinct locations with different climate regimes and vegetation types. We introduce CarbonBench, the first benchmark for zero-shot spatial transfer in carbon flux upscaling. CarbonBench comprises over 1.3 million daily observations from 567 flux tower sites globally (2000-2024). It provides: (1) stratified evaluation protocols that explicitly test generalization across unseen vegetation types and climate regimes, separating spatial transfer from temporal autocorrelation; (2) a harmonized set of remote sensing and meteorological features to enable flexible architecture design; and (3) baselines ranging from tree-based methods to domain-generalization architectures. By bridging machine learning methodologies and Earth system science, CarbonBench aims to enable systematic comparison of transfer learning methods, serves as a testbed for regression under distribution shift, and contributes to the next-generation climate modeling efforts.
Paper Structure (41 sections, 10 equations, 8 figures, 3 tables)

This paper contains 41 sections, 10 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Overview of the CarbonBench setup. Eddy covariance flux observations provide sparse footprint-level targets, while remote sensing and meteorological data serve as gridded inputs for spatial generalization.
  • Figure 2: Summary of data sample quality in CarbonBench. The upper panel shows the empirical distribution of $NEE\_VUT\_USTAR50\_QC$ across all samples. The lower left panel reports mean quality values aggregated by IGBP class, with bars indicating one standard deviation. The lower right panel shows the same aggregation for Köppen climate classes.
  • Figure 3: Temporal coverage of CarbonBench, showing the distribution of samples by year (left) and by month (right).
  • Figure 4: The distribution of site number by different IGBP and Köppen types, illustrating the large imbalance between geographical context of the available ground-true observations.
  • Figure 5: The average flux aggregates by Köppen climate type with bars representing standard deviation (left axis) and the number of site per climate type (right axis).
  • ...and 3 more figures