Table of Contents
Fetching ...

Improving Tropical Cyclone Forecasting With Video Diffusion Models

Zhibo Ren, Pritthijit Nath, Pancham Shukla

TL;DR

This study addresses the challenge of tropical cyclone forecasting by capturing temporal evolution with video diffusion models rather than frame-by-frame predictions. A 3D UNet-based video diffusion framework conditioned on initial infrared frames and ERA5 data generates multi-frame forecasts (10 frames at a time) and uses a two-stage training curriculum to stabilize learning. The approach yields substantial improvements in MAE, PSNR, SSIM, and especially temporal coherence measured by Frechet-Video-Distance, while extending the reliable forecasting horizon from 36 to 50 hours. These results suggest broader potential for temporally coherent deep learning forecasts in weather prediction and motivate future work on multi-channel data, more frames per prediction, and physics-informed losses.

Abstract

Tropical cyclone (TC) forecasting is crucial for disaster preparedness and mitigation. While recent deep learning approaches have shown promise, existing methods often treat TC evolution as a series of independent frame-to-frame predictions, limiting their ability to capture long-term dynamics. We present a novel application of video diffusion models for TC forecasting that explicitly models temporal dependencies through additional temporal layers. Our approach enables the model to generate multiple frames simultaneously, better capturing cyclone evolution patterns. We introduce a two-stage training strategy that significantly improves individual-frame quality and performance in low-data regimes. Experimental results show our method outperforms the previous approach of Nath et al. by 19.3% in MAE, 16.2% in PSNR, and 36.1% in SSIM. Most notably, we extend the reliable forecasting horizon from 36 to 50 hours. Through comprehensive evaluation using both traditional metrics and Fréchet Video Distance (FVD), we demonstrate that our approach produces more temporally coherent forecasts while maintaining competitive single-frame quality. Code accessible at https://github.com/Ren-creater/forecast-video-diffmodels.

Improving Tropical Cyclone Forecasting With Video Diffusion Models

TL;DR

This study addresses the challenge of tropical cyclone forecasting by capturing temporal evolution with video diffusion models rather than frame-by-frame predictions. A 3D UNet-based video diffusion framework conditioned on initial infrared frames and ERA5 data generates multi-frame forecasts (10 frames at a time) and uses a two-stage training curriculum to stabilize learning. The approach yields substantial improvements in MAE, PSNR, SSIM, and especially temporal coherence measured by Frechet-Video-Distance, while extending the reliable forecasting horizon from 36 to 50 hours. These results suggest broader potential for temporally coherent deep learning forecasts in weather prediction and motivate future work on multi-channel data, more frames per prediction, and physics-informed losses.

Abstract

Tropical cyclone (TC) forecasting is crucial for disaster preparedness and mitigation. While recent deep learning approaches have shown promise, existing methods often treat TC evolution as a series of independent frame-to-frame predictions, limiting their ability to capture long-term dynamics. We present a novel application of video diffusion models for TC forecasting that explicitly models temporal dependencies through additional temporal layers. Our approach enables the model to generate multiple frames simultaneously, better capturing cyclone evolution patterns. We introduce a two-stage training strategy that significantly improves individual-frame quality and performance in low-data regimes. Experimental results show our method outperforms the previous approach of Nath et al. by 19.3% in MAE, 16.2% in PSNR, and 36.1% in SSIM. Most notably, we extend the reliable forecasting horizon from 36 to 50 hours. Through comprehensive evaluation using both traditional metrics and Fréchet Video Distance (FVD), we demonstrate that our approach produces more temporally coherent forecasts while maintaining competitive single-frame quality. Code accessible at https://github.com/Ren-creater/forecast-video-diffmodels.

Paper Structure

This paper contains 14 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Qualitative comparison of TC forecasting results on the first four frames generated. From top to bottom: (1) Ground truth, (2) our VDM predictions, (3) the difference between VDM prediction and ground truth, (4) Nath et al.'s predictions, and (5) the difference between Nath et al.'s predictions and ground truth. Our VDM method demonstrates improved temporal consistency and more accurate TC evolution patterns.
  • Figure A.1: Illustration of the model pipeline. Our VDM model takes as input a noisy image $\mathbf{z}_t$, conditioning variables $\mathbf{c}$, and a timestep embedding $\lambda_t$, and progressively denoises the sample using a U-Net-based architecture with skip connections. ERA5 data from multiple timesteps ($t_1, t_2, t_3$) is used as conditioning information. Output $\hat{\mathbf{x}}$ represents the denoised prediction.
  • Figure C.2: SSIM values over the entire cyclonic duration. The dashed lines indicate the hourly marks at which the minimum SSIM values are obtained for each cyclone.