Improving Tropical Cyclone Forecasting With Video Diffusion Models
Zhibo Ren, Pritthijit Nath, Pancham Shukla
TL;DR
This study addresses the challenge of tropical cyclone forecasting by capturing temporal evolution with video diffusion models rather than frame-by-frame predictions. A 3D UNet-based video diffusion framework conditioned on initial infrared frames and ERA5 data generates multi-frame forecasts (10 frames at a time) and uses a two-stage training curriculum to stabilize learning. The approach yields substantial improvements in MAE, PSNR, SSIM, and especially temporal coherence measured by Frechet-Video-Distance, while extending the reliable forecasting horizon from 36 to 50 hours. These results suggest broader potential for temporally coherent deep learning forecasts in weather prediction and motivate future work on multi-channel data, more frames per prediction, and physics-informed losses.
Abstract
Tropical cyclone (TC) forecasting is crucial for disaster preparedness and mitigation. While recent deep learning approaches have shown promise, existing methods often treat TC evolution as a series of independent frame-to-frame predictions, limiting their ability to capture long-term dynamics. We present a novel application of video diffusion models for TC forecasting that explicitly models temporal dependencies through additional temporal layers. Our approach enables the model to generate multiple frames simultaneously, better capturing cyclone evolution patterns. We introduce a two-stage training strategy that significantly improves individual-frame quality and performance in low-data regimes. Experimental results show our method outperforms the previous approach of Nath et al. by 19.3% in MAE, 16.2% in PSNR, and 36.1% in SSIM. Most notably, we extend the reliable forecasting horizon from 36 to 50 hours. Through comprehensive evaluation using both traditional metrics and Fréchet Video Distance (FVD), we demonstrate that our approach produces more temporally coherent forecasts while maintaining competitive single-frame quality. Code accessible at https://github.com/Ren-creater/forecast-video-diffmodels.
