Defining error accumulation in ML atmospheric simulators
Raghul Parthipan, Mohit Anand, Hannah M. Christensen, J. Scott Hosking, Damon J. Wischik
TL;DR
This work defines error accumulation for autoregressive ML atmospheric simulators and introduces a KL-divergence-based metric that compares a generative model to a CTS reference to isolate fixable model deficiencies from intrinsic chaos and unobserved-variable effects. It further proposes a regularization strategy that adds a KL penalty to the likelihood objective, guided by the error-accumulation metric, and validates the approach on Lorenz-63, Lorenz-96, and ERA5-based weather data. Results show improvements in RMSE and spread/skill, with the error-accumulation signal providing diagnostic insight into where models may be improved and how CTS quality influences the signal. The findings highlight practical impacts for ensemble forecasting and emphasize CTS improvements as a key lever for advancing ML-based weather prediction while noting computational and methodological limitations.
Abstract
Machine learning (ML) has recently shown significant promise in modelling atmospheric systems, such as the weather. Many of these ML models are autoregressive, and error accumulation in their forecasts is a key problem. However, there is no clear definition of what `error accumulation' actually entails. In this paper, we propose a definition and an associated metric to measure it. Our definition distinguishes between errors which are due to model deficiencies, which we may hope to fix, and those due to the intrinsic properties of atmospheric systems (chaos, unobserved variables), which are not fixable. We illustrate the usefulness of this definition by proposing a simple regularization loss penalty inspired by it. This approach shows performance improvements (according to RMSE and spread/skill) in a selection of atmospheric systems, including the real-world weather prediction task.
