Machine Learning Workflows in Climate Modeling: Design Patterns and Insights from Case Studies
Tian Zheng, Subashree Venkatasubramanian, Shuolin Li, Amy Braverman, Xinyi Ke, Zhewen Hou, Peter Jin, Samarth Sanjay Agrawal
TL;DR
This work analyzes how machine learning can be integrated into climate modeling through modular, science-guided workflows that couple physics, data, and ML. It delineates four guiding directions—Physics-First, Data-First, ML-First, and Human-in-the-Loop—and demonstrates eight case studies ranging from neural-operator surrogates to simulation-based inference and transfer learning for sparse observations. The authors emphasize reproducibility, interpretability, online stability, and uncertainty quantification as core design goals, and propose structured design, development, deployment, and evaluation phases to achieve scientifically robust ML workflows. The paper highlights recurring patterns such as modularity, physics-informed constraints, and the crucial distinction between offline skill and online deployment performance, while outlining challenges in standardization, scalability, and deeper integration with scientific reasoning.
Abstract
Machine learning has been increasingly applied in climate modeling on system emulation acceleration, data-driven parameter inference, forecasting, and knowledge discovery, addressing challenges such as physical consistency, multi-scale coupling, data sparsity, robust generalization, and integration with scientific workflows. This paper analyzes a series of case studies from applied machine learning research in climate modeling, with a focus on design choices and workflow structure. Rather than reviewing technical details, we aim to synthesize workflow design patterns across diverse projects in ML-enabled climate modeling: from surrogate modeling, ML parameterization, probabilistic programming, to simulation-based inference, and physics-informed transfer learning. We unpack how these workflows are grounded in physical knowledge, informed by simulation data, and designed to integrate observations. We aim to offer a framework for ensuring rigor in scientific machine learning through more transparent model development, critical evaluation, informed adaptation, and reproducibility, and to contribute to lowering the barrier for interdisciplinary collaboration at the interface of data science and climate modeling.
