PuYun-LDM: A Latent Diffusion Model for High-Resolution Ensemble Weather Forecasts
Lianjun Wu, Shengchen Zhu, Yuxuan Liu, Liuyu Kai, Xiaoduan Feng, Duomin Wang, Wenshuo Liu, Jingxuan Zhang, Kelvin Li, Bin Wang
TL;DR
PuYun-LDM tackles the diffusability challenges of high-resolution latent diffusion models for ensemble weather forecasting by introducing two key components: a temporally informed conditioning via a 3D Masked AutoEncoder (3D-MAE) and a variable-aware spectral regularization via VA-MFM. The framework models the transition distribution $p(X_t|X_{t-1})$ with a conditional diffusion process, embedding temporal evolution in the latent space and balancing spectral content across meteorological variables. Empirical results on ERA5-based experiments show PuYun-LDM achieves superior RMSE and CRPS relative to ENS at short lead times and remains competitive at longer horizons, with efficient parallel ensemble generation enabling practical global 15-day forecasts on NVIDIA $H200$ GPUs. This work provides a principled pathway for applying latent diffusion models to atmospheric fields by integrating physics-informed temporal conditioning and variable-aware spectral regularization, addressing both diffusibility and heterogeneity in multivariate weather data.
Abstract
Latent diffusion models (LDMs) suffer from limited diffusability in high-resolution (<=0.25°) ensemble weather forecasting, where diffusability characterizes how easily a latent data distribution can be modeled by a diffusion process. Unlike natural image fields, meteorological fields lack task-agnostic foundation models and explicit semantic structures, making VFM-based regularization inapplicable. Moreover, existing frequency-based approaches impose identical spectral regularization across channels under a homogeneity assumption, which leads to uneven regularization strength under the inter-variable spectral heterogeneity in multivariate meteorological data. To address these challenges, we propose a 3D Masked AutoEncoder (3D-MAE) that encodes weather-state evolution features as an additional conditioning for the diffusion model, together with a Variable-Aware Masked Frequency Modeling (VA-MFM) strategy that adaptively selects thresholds based on the spectral energy distribution of each variable. Together, we propose PuYun-LDM, which enhances latent diffusability and achieves superior performance to ENS at short lead times while remaining comparable to ENS at longer horizons. PuYun-LDM generates a 15-day global forecast with a 6-hour temporal resolution in five minutes on a single NVIDIA H200 GPU, while ensemble forecasts can be efficiently produced in parallel.
