Table of Contents
Fetching ...

CarbonSense: A Multimodal Dataset and Baseline for Carbon Flux Modelling

Matthew Fortier, Mats L. Richter, Oliver Sonnentag, Chris Pal

TL;DR

CarbonSense addresses the lack of standardized data for data-driven carbon flux modelling by providing the first ML-ready multimodal dataset combining eddy covariance flux measurements with MODIS geospatial data from 385 sites and over 27 million hourly observations. The authors introduce EcoPerceiver, a transformer-based multimodal architecture that leverages windowed cross attention to integrate meteorological, geospatial, and semantic inputs, and compare it to a strong XGBoost baseline. Results show that EcoPerceiver achieves higher NSE and lower RMSE across most ecosystem types, and demonstrates superior generalization to out-of-distribution sites, suggesting multimodal deep learning can significantly improve carbon flux predictions. The dataset, baselines, and experimental guidelines promote reproducibility and accelerate progress in global carbon flux modelling, with potential impacts on climate decision-making.

Abstract

Terrestrial carbon fluxes provide vital information about our biosphere's health and its capacity to absorb anthropogenic CO$_2$ emissions. The importance of predicting carbon fluxes has led to the emerging field of data-driven carbon flux modelling (DDCFM), which uses statistical techniques to predict carbon fluxes from biophysical data. However, the field lacks a standardized dataset to promote comparisons between models. To address this gap, we present CarbonSense, the first machine learning-ready dataset for DDCFM. CarbonSense integrates measured carbon fluxes, meteorological predictors, and satellite imagery from 385 locations across the globe, offering comprehensive coverage and facilitating robust model training. Additionally, we provide a baseline model using a current state-of-the-art DDCFM approach and a novel transformer based model. Our experiments illustrate the potential gains that multimodal deep learning techniques can bring to this domain. By providing these resources, we aim to lower the barrier to entry for other deep learning researchers to develop new models and drive new advances in carbon flux modelling.

CarbonSense: A Multimodal Dataset and Baseline for Carbon Flux Modelling

TL;DR

CarbonSense addresses the lack of standardized data for data-driven carbon flux modelling by providing the first ML-ready multimodal dataset combining eddy covariance flux measurements with MODIS geospatial data from 385 sites and over 27 million hourly observations. The authors introduce EcoPerceiver, a transformer-based multimodal architecture that leverages windowed cross attention to integrate meteorological, geospatial, and semantic inputs, and compare it to a strong XGBoost baseline. Results show that EcoPerceiver achieves higher NSE and lower RMSE across most ecosystem types, and demonstrates superior generalization to out-of-distribution sites, suggesting multimodal deep learning can significantly improve carbon flux predictions. The dataset, baselines, and experimental guidelines promote reproducibility and accelerate progress in global carbon flux modelling, with potential impacts on climate decision-making.

Abstract

Terrestrial carbon fluxes provide vital information about our biosphere's health and its capacity to absorb anthropogenic CO emissions. The importance of predicting carbon fluxes has led to the emerging field of data-driven carbon flux modelling (DDCFM), which uses statistical techniques to predict carbon fluxes from biophysical data. However, the field lacks a standardized dataset to promote comparisons between models. To address this gap, we present CarbonSense, the first machine learning-ready dataset for DDCFM. CarbonSense integrates measured carbon fluxes, meteorological predictors, and satellite imagery from 385 locations across the globe, offering comprehensive coverage and facilitating robust model training. Additionally, we provide a baseline model using a current state-of-the-art DDCFM approach and a novel transformer based model. Our experiments illustrate the potential gains that multimodal deep learning techniques can bring to this domain. By providing these resources, we aim to lower the barrier to entry for other deep learning researchers to develop new models and drive new advances in carbon flux modelling.
Paper Structure (38 sections, 2 equations, 14 figures, 11 tables)

This paper contains 38 sections, 2 equations, 14 figures, 11 tables.

Figures (14)

  • Figure 1: Simplified EC station. Sensors measure atmospheric gas concentrations across eddies.
  • Figure 2: Global map of eddy covariance sites used in CarbonSense, with corresponding source networks. Some sites were present in multiple networks.
  • Figure 3: Data pipeline used to create CarbonSense from EC and MODIS data.
  • Figure 4: Overview of EcoPerceiver architecture.
  • Figure 5: Fourier input encoding for EcoPerceiver. Spectral inputs are similarly processed, but with a linear projection instead of Fourier encoding.
  • ...and 9 more figures