BeamVQ: Beam Search with Vector Quantization to Mitigate Data Scarcity in Physical Spatiotemporal Forecasting
Weiyan Wang, Xingjian Shi, Ruiqi Shu, Yuan Gao, Rui Ray Chen, Kun Wang, Fan Xu, Jinbao Xue, Shuaipeng Li, Yangyu Tao, Di Wang, Hao Wu, Xiaomeng Huang
TL;DR
BeamVQ tackles data scarcity in physical spatiotemporal forecasting by introducing a probabilistic framework that combines a deterministic base predictor with a Top-K VQ-VAE to generate diverse futures. It uses beam search over the continuous state space to explore multiple trajectory variants and employs a domain-specific metric to guide selection, enabling a self-ensemble that augments training data. Across meteorological, fluid dynamics, and PDE-based benchmarks, BeamVQ achieves substantial mean squared error reductions (up to 39%) and improves extreme-event detection and physical plausibility. The approach enhances long-horizon forecasting and provides a robust, uncertainty-aware toolkit for data-limited physical systems, with broad applicability to climate, ocean, and engineering simulations.
Abstract
In practice, physical spatiotemporal forecasting can suffer from data scarcity, because collecting large-scale data is non-trivial, especially for extreme events. Hence, we propose \method{}, a novel probabilistic framework to realize iterative self-training with new self-ensemble strategies, achieving better physical consistency and generalization on extreme events. Following any base forecasting model, we can encode its deterministic outputs into a latent space and retrieve multiple codebook entries to generate probabilistic outputs. Then BeamVQ extends the beam search from discrete spaces to the continuous state spaces in this field. We can further employ domain-specific metrics (e.g., Critical Success Index for extreme events) to filter out the top-k candidates and develop the new self-ensemble strategy by combining the high-quality candidates. The self-ensemble can not only improve the inference quality and robustness but also iteratively augment the training datasets during continuous self-training. Consequently, BeamVQ realizes the exploration of rare but critical phenomena beyond the original dataset. Comprehensive experiments on different benchmarks and backbones show that BeamVQ consistently reduces forecasting MSE (up to 39%), enhancing extreme events detection and proving its effectiveness in handling data scarcity.
