Time-Series Forecasting, Knowledge Distillation, and Refinement within a Multimodal PDE Foundation Model
Derek Jollie, Jingmin Sun, Zecheng Zhang, Hayden Schaeffer
TL;DR
The paper tackles generalization gaps in time-series forecasting of spatiotemporal PDEs by leveraging an additional equation modality in a PDE foundation model. It replaces manual PROSE symbolic encoding with a SymPy-based standardization (SymPy tree) to automatically produce consistent token sequences, enabling more robust zero-shot extrapolation to new operators. A Bayesian Sequential Monte Carlo (SMC) particle-filter module is introduced to refine learned PDE coefficients, improving equation accuracy and long-term stability. Experiments across five symbolic-encoding settings demonstrate that the SymPy-based encoding yields superior robustness to term reordering and noise, while particle-filter refinement further reduces time-series errors, pointing to a scalable, automated approach for multimodal PDE-based forecasting.
Abstract
Symbolic encoding has been used in multi-operator learning as a way to embed additional information for distinct time-series data. For spatiotemporal systems described by time-dependent partial differential equations, the equation itself provides an additional modality to identify the system. The utilization of symbolic expressions along side time-series samples allows for the development of multimodal predictive neural networks. A key challenge with current approaches is that the symbolic information, i.e. the equations, must be manually preprocessed (simplified, rearranged, etc.) to match and relate to the existing token library, which increases costs and reduces flexibility, especially when dealing with new differential equations. We propose a new token library based on SymPy to encode differential equations as an additional modality for time-series models. The proposed approach incurs minimal cost, is automated, and maintains high prediction accuracy for forecasting tasks. Additionally, we include a Bayesian filtering module that connects the different modalities to refine the learned equation. This improves the accuracy of the learned symbolic representation and the predicted time-series.
