Probing forced responses and causality in data-driven climate emulators: conceptual limitations and the role of reduced-order models
Fabrizio Falasca
TL;DR
The paper tackles how data-driven emulators can be designed to capture not only stationary climate variability but also forced responses, a key for causal inference. It proposes a framework that blends reduced-order stochastic models with response theory, illustrating the approach first on a controlled triad model and then on a real-world SST–TOA radiative-flux system. Neural emulators with multiplicative noise outperform linear baselines in reproducing stationary statistics and perturbation responses, especially for mean and variance, while partial-observation scenarios require careful variable selection and stochastic parametrization. The real-world example demonstrates that tailored reduced-order models can reveal meaningful SST–TOA causal links (pattern effects) and enable large-ensemble perturbation studies, though challenges remain in memory effects, autocorrelation, and data requirements. Overall, the work argues for coarse-grained, stochastic modeling, guided by response theory, as a principled path to improve causal understanding in multiscale climate systems, rather than relying on general-purpose, full-resolution emulators.
Abstract
A central challenge in climate science and applied mathematics is developing data-driven models of multiscale systems that capture both stationary statistics and responses to external perturbations. Current neural climate emulators aim to resolve the atmosphere-ocean system in all its complexity but often struggle to reproduce forced responses, limiting their use in causal studies such as Green's function experiments. To explore the origin of these limitations, we first examine a simplified dynamical system that retains key features of climate variability. We interpret the results through linear response theory, providing a rigorous framework to evaluate neural models beyond stationary statistics and to probe causal mechanisms. We argue that the ability of emulators of multiscale systems to reproduce perturbed statistics depends critically on (i) the choice of an appropriate coarse-grained representation and (ii) careful parameterizations of unresolved processes. These insights highlight reduced-order models, tailored to specific goals, processes, and scales, as valuable alternatives to general-purpose emulators. We next consider a real-world application by developing a neural model to investigate the joint variability of the surface temperature field and radiative fluxes. The model infers a multiplicative noise process directly from data, largely reproduces the system's probability distribution, and enables causal studies through forced responses. We discuss its limitations and outline directions for future work. Overall, these results expose key challenges in data-driven modeling of multiscale physical systems and underscore the value of coarse-grained, stochastic approaches, with response theory providing a principled framework to guide model design and enhance causal understanding.
