Table of Contents
Fetching ...

Probing forced responses and causality in data-driven climate emulators: conceptual limitations and the role of reduced-order models

Fabrizio Falasca

TL;DR

The paper tackles how data-driven emulators can be designed to capture not only stationary climate variability but also forced responses, a key for causal inference. It proposes a framework that blends reduced-order stochastic models with response theory, illustrating the approach first on a controlled triad model and then on a real-world SST–TOA radiative-flux system. Neural emulators with multiplicative noise outperform linear baselines in reproducing stationary statistics and perturbation responses, especially for mean and variance, while partial-observation scenarios require careful variable selection and stochastic parametrization. The real-world example demonstrates that tailored reduced-order models can reveal meaningful SST–TOA causal links (pattern effects) and enable large-ensemble perturbation studies, though challenges remain in memory effects, autocorrelation, and data requirements. Overall, the work argues for coarse-grained, stochastic modeling, guided by response theory, as a principled path to improve causal understanding in multiscale climate systems, rather than relying on general-purpose, full-resolution emulators.

Abstract

A central challenge in climate science and applied mathematics is developing data-driven models of multiscale systems that capture both stationary statistics and responses to external perturbations. Current neural climate emulators aim to resolve the atmosphere-ocean system in all its complexity but often struggle to reproduce forced responses, limiting their use in causal studies such as Green's function experiments. To explore the origin of these limitations, we first examine a simplified dynamical system that retains key features of climate variability. We interpret the results through linear response theory, providing a rigorous framework to evaluate neural models beyond stationary statistics and to probe causal mechanisms. We argue that the ability of emulators of multiscale systems to reproduce perturbed statistics depends critically on (i) the choice of an appropriate coarse-grained representation and (ii) careful parameterizations of unresolved processes. These insights highlight reduced-order models, tailored to specific goals, processes, and scales, as valuable alternatives to general-purpose emulators. We next consider a real-world application by developing a neural model to investigate the joint variability of the surface temperature field and radiative fluxes. The model infers a multiplicative noise process directly from data, largely reproduces the system's probability distribution, and enables causal studies through forced responses. We discuss its limitations and outline directions for future work. Overall, these results expose key challenges in data-driven modeling of multiscale physical systems and underscore the value of coarse-grained, stochastic approaches, with response theory providing a principled framework to guide model design and enhance causal understanding.

Probing forced responses and causality in data-driven climate emulators: conceptual limitations and the role of reduced-order models

TL;DR

The paper tackles how data-driven emulators can be designed to capture not only stationary climate variability but also forced responses, a key for causal inference. It proposes a framework that blends reduced-order stochastic models with response theory, illustrating the approach first on a controlled triad model and then on a real-world SST–TOA radiative-flux system. Neural emulators with multiplicative noise outperform linear baselines in reproducing stationary statistics and perturbation responses, especially for mean and variance, while partial-observation scenarios require careful variable selection and stochastic parametrization. The real-world example demonstrates that tailored reduced-order models can reveal meaningful SST–TOA causal links (pattern effects) and enable large-ensemble perturbation studies, though challenges remain in memory effects, autocorrelation, and data requirements. Overall, the work argues for coarse-grained, stochastic modeling, guided by response theory, as a principled path to improve causal understanding in multiscale climate systems, rather than relying on general-purpose, full-resolution emulators.

Abstract

A central challenge in climate science and applied mathematics is developing data-driven models of multiscale systems that capture both stationary statistics and responses to external perturbations. Current neural climate emulators aim to resolve the atmosphere-ocean system in all its complexity but often struggle to reproduce forced responses, limiting their use in causal studies such as Green's function experiments. To explore the origin of these limitations, we first examine a simplified dynamical system that retains key features of climate variability. We interpret the results through linear response theory, providing a rigorous framework to evaluate neural models beyond stationary statistics and to probe causal mechanisms. We argue that the ability of emulators of multiscale systems to reproduce perturbed statistics depends critically on (i) the choice of an appropriate coarse-grained representation and (ii) careful parameterizations of unresolved processes. These insights highlight reduced-order models, tailored to specific goals, processes, and scales, as valuable alternatives to general-purpose emulators. We next consider a real-world application by developing a neural model to investigate the joint variability of the surface temperature field and radiative fluxes. The model infers a multiplicative noise process directly from data, largely reproduces the system's probability distribution, and enables causal studies through forced responses. We discuss its limitations and outline directions for future work. Overall, these results expose key challenges in data-driven modeling of multiscale physical systems and underscore the value of coarse-grained, stochastic approaches, with response theory providing a principled framework to guide model design and enhance causal understanding.

Paper Structure

This paper contains 48 sections, 17 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: First row: stationary distributions of the variables $x_1(t), x_2(t)$ and $x_3(t)$ as modeled by the original triad system in Eq. \ref{['eq:triad_system']}, the linear emulator in Eq. \ref{['eq:LIM-triad']} and the neural emulator in Eq. \ref{['eq:ANN-triad']}. Stationary distributions are obtained after running the three systems for $10^5$ model time units. Second row: same as the first row but for the autocorrelation functions.
  • Figure 2: Impulse response functions as modeled by the original triad system in Eq. \ref{['eq:triad_system']}, the linear emulator in Eq. \ref{['eq:LIM-triad']} and the neural emulator in Eq. \ref{['eq:ANN-triad']}. First row: response of the ensemble mean to impulse perturbations imposed on $x_1(0)$ at time $t = 0$. Second row: response of the ensemble variance to impulse perturbations imposed on $x_1(0)$ at time $t = 0$. Impulse response functions are computed using an ensemble size of $N_e = 10^5$ members.
  • Figure 3: Response to a small logarithmic forcing $F = (0.01 log(1+t),0,0)$ imposed on the right-hand-side in Eq. \ref{['eq:triad_system']}, the linear emulator in Eq. \ref{['eq:LIM-triad']} and the neural emulator in Eq. \ref{['eq:ANN-triad']}. First row: response of the ensemble mean to external forcing. Second row: response of the ensemble variance to external forcing. Responses are computed using an ensemble of $N_e = 10^5$ members.
  • Figure 4: First row: stationary probability distribution (panel (a)); ensemble mean response to an impulse perturbation (panel (b)); ensemble variance response to an impulse perturbation (panel (c)) for the variable $x_1$ in the triad system in Eq. \ref{['eq:triad_system']} and the variable $x$ in the neural scalar emulator with additive noise in Eq. \ref{['eq:NN-scalar-additive']}. Second row: same as the first row but we are now considering the neural scalar emulator with multiplicative noise in Eq. \ref{['eq:NN-scalar-multiplicative']}. Impulse response functions are computed using an ensemble of $N_e = 10^6$ members.
  • Figure 5: Same as in Figure \ref{['fig:triad_impulse_reduced']} however here the scalar emulators in Eq. \ref{['eq:NN-scalar-additive']} and Eq. \ref{['eq:NN-scalar-multiplicative']} have been trained on a long trajectory of $x_1(t)$ after coarse-graining by averaging every 10 time-steps. No additional preprocessing was performed for the original triad system in Eq. \ref{['eq:triad_system']}, i.e. the blue curves in this Figure and in Figure \ref{['fig:triad_impulse_reduced']} are the same.
  • ...and 5 more figures