Unlocking Out-of-Distribution Generalization in Dynamics through Physics-Guided Augmentation
Fan Xu, Hao Wu, Kun Wang, Nan Wang, Qingsong Wen, Xian Wu, Wei Gong, Xibin Zhao
TL;DR
SPARK tackles out-of-distribution generalization and data scarcity in dynamical systems by introducing physics-guided augmentation. It encodes physical priors into a discrete state dictionary via a reconstruction autoencoder and vector quantization, enabling principled latent-space interpolation for augmentation. Prediction uses a Fourier-enhanced Graph ODE with attention-based history encoding to capture long-term dynamics. The approach is supported by information-theoretic generalization bounds showing reduced dependence on training data when physical priors are included, and it demonstrates state-of-the-art performance across Prometheus, ERA5, Navier–Stokes, Spherical-SWE, and sea-ice tasks, including transfer to data-scarce domains. Overall, SPARK offers a robust, scalable framework for physics-informed dynamical modeling with strong OOD and transfer capabilities.
Abstract
In dynamical system modeling, traditional numerical methods are limited by high computational costs, while modern data-driven approaches struggle with data scarcity and distribution shifts. To address these fundamental limitations, we first propose SPARK, a physics-guided quantitative augmentation plugin. Specifically, SPARK utilizes a reconstruction autoencoder to integrate physical parameters into a physics-rich discrete state dictionary. This state dictionary then acts as a structured dictionary of physical states, enabling the creation of new, physically-plausible training samples via principled interpolation in the latent space. Further, for downstream prediction, these augmented representations are seamlessly integrated with a Fourier-enhanced Graph ODE, a combination designed to robustly model the enriched data distribution while capturing long-term temporal dependencies. Extensive experiments on diverse benchmarks demonstrate that SPARK significantly outperforms state-of-the-art baselines, particularly in challenging out-of-distribution scenarios and data-scarce regimes, proving the efficacy of our physics-guided augmentation paradigm.
