Table of Contents
Fetching ...

Learning from History: A Retrieval-Augmented Framework for Spatiotemporal Prediction

Hao Jia, Penghao Zhao, Hao Wu, Yuan Gao, Yangyu Tao, Bin Cui

TL;DR

The paper tackles the difficulty of long-term, high-fidelity spatiotemporal forecasting in complex physical systems, where purely parametric deep learning models accumulate error and violate physical realism. It introduces Retrieval-Augmented Prediction (RAP), a three-stage framework (Retrieve, Augment, Predict) that uses historical analogs as non-parametric dynamic guidance by feeding the retrieved true future $oldsymbol{Y}_{\text{ref}}$ into a dual-stream network alongside the current state $oldsymbol{X}_{\text{query}}$. Unlike hard constraints, $oldsymbol{Y}_{\text{ref}}$ serves as a conditional input that regularizes learning, with a loss that combines $\mathcal{L}_1$ and $\mathcal{L}_{\text{MSE}}$ but excludes the reference from the loss to avoid trivial copying. The authors validate RAP across ERA5 weather forecasting, 2D turbulence, and fire-spread simulations, showing consistent improvements over diverse baselines and enhanced physical fidelity, including sharper vortices and flame fronts in long-term rollouts. They also demonstrate robustness and scalability, including data-efficient benefits for large models and ablations that underscore the importance of the dual-stream integration and the role of history-derived guidance for stable, physically plausible predictions.

Abstract

Accurate and long-term spatiotemporal prediction for complex physical systems remains a fundamental challenge in scientific computing. While deep learning models, as powerful parametric approximators, have shown remarkable success, they suffer from a critical limitation: the accumulation of errors during long-term autoregressive rollouts often leads to physically implausible artifacts. This deficiency arises from their purely parametric nature, which struggles to capture the full constraints of a system's intrinsic dynamics. To address this, we introduce a novel \textbf{Retrieval-Augmented Prediction (RAP)} framework, a hybrid paradigm that synergizes the predictive power of deep networks with the grounded truth of historical data. The core philosophy of RAP is to leverage historical evolutionary exemplars as a non-parametric estimate of the system's local dynamics. For any given state, RAP efficiently retrieves the most similar historical analog from a large-scale database. The true future evolution of this analog then serves as a \textbf{reference target}. Critically, this target is not a hard constraint in the loss function but rather a powerful conditional input to a specialized dual-stream architecture. It provides strong \textbf{dynamic guidance}, steering the model's predictions towards physically viable trajectories. In extensive benchmarks across meteorology, turbulence, and fire simulation, RAP not only surpasses state-of-the-art methods but also significantly outperforms a strong \textbf{analog-only forecasting baseline}. More importantly, RAP generates predictions that are more physically realistic by effectively suppressing error divergence in long-term rollouts.

Learning from History: A Retrieval-Augmented Framework for Spatiotemporal Prediction

TL;DR

The paper tackles the difficulty of long-term, high-fidelity spatiotemporal forecasting in complex physical systems, where purely parametric deep learning models accumulate error and violate physical realism. It introduces Retrieval-Augmented Prediction (RAP), a three-stage framework (Retrieve, Augment, Predict) that uses historical analogs as non-parametric dynamic guidance by feeding the retrieved true future into a dual-stream network alongside the current state . Unlike hard constraints, serves as a conditional input that regularizes learning, with a loss that combines and but excludes the reference from the loss to avoid trivial copying. The authors validate RAP across ERA5 weather forecasting, 2D turbulence, and fire-spread simulations, showing consistent improvements over diverse baselines and enhanced physical fidelity, including sharper vortices and flame fronts in long-term rollouts. They also demonstrate robustness and scalability, including data-efficient benefits for large models and ablations that underscore the importance of the dual-stream integration and the role of history-derived guidance for stable, physically plausible predictions.

Abstract

Accurate and long-term spatiotemporal prediction for complex physical systems remains a fundamental challenge in scientific computing. While deep learning models, as powerful parametric approximators, have shown remarkable success, they suffer from a critical limitation: the accumulation of errors during long-term autoregressive rollouts often leads to physically implausible artifacts. This deficiency arises from their purely parametric nature, which struggles to capture the full constraints of a system's intrinsic dynamics. To address this, we introduce a novel \textbf{Retrieval-Augmented Prediction (RAP)} framework, a hybrid paradigm that synergizes the predictive power of deep networks with the grounded truth of historical data. The core philosophy of RAP is to leverage historical evolutionary exemplars as a non-parametric estimate of the system's local dynamics. For any given state, RAP efficiently retrieves the most similar historical analog from a large-scale database. The true future evolution of this analog then serves as a \textbf{reference target}. Critically, this target is not a hard constraint in the loss function but rather a powerful conditional input to a specialized dual-stream architecture. It provides strong \textbf{dynamic guidance}, steering the model's predictions towards physically viable trajectories. In extensive benchmarks across meteorology, turbulence, and fire simulation, RAP not only surpasses state-of-the-art methods but also significantly outperforms a strong \textbf{analog-only forecasting baseline}. More importantly, RAP generates predictions that are more physically realistic by effectively suppressing error divergence in long-term rollouts.

Paper Structure

This paper contains 41 sections, 10 equations, 5 figures, 7 tables, 1 algorithm.

Figures (5)

  • Figure 1: An overview of the Retrieval-Augmented Prediction (RAP) framework. Given a Query Input, a similarity search is performed on a Historical Database to find the best analog, whose future evolution is designated as the Reference Target. This reference is then used in two ways: (a) directly as the prediction for the analog-only baseline, and (b) as a conditional input (dynamic guidance) to our Dual-Stream Fusion Model, which also takes the original query to generate the final RAP Prediction. Both predictions are evaluated against the Ground Truth Future.
  • Figure 2: Quantitative comparison of baselines versus their RAP-enhanced counterparts across diverse benchmarks. The figure provides a detailed performance comparison on the Weatherbench2 (top row) and Prometheus (bottom row) benchmarks, covering four key metrics: MSE ($\downarrow$), MAE ($\downarrow$), PSNR ($\uparrow$), and SSIM ($\uparrow$). It is evident that for all selected models, the RAP-enhanced versions (red) consistently and significantly outperform the original baseline models (blue) across all metrics. This result strongly validates the effectiveness and generality of our RAP framework, highlighting its core contribution of leveraging historical analogs as explicit dynamical guidance to improve the accuracy of complex spatiotemporal prediction tasks.
  • Figure 3: Visual comparison on Turbulence (left) and Prometheus (right). Unlike baseline models which blur results, RAP preserves sharp physical features like vortices and flame fronts.
  • Figure 4: Qualitative comparison on the Weatherbench dataset. The with Encodervariant corresponds to our full RAP framework, while without Encoder is the baseline model.
  • Figure 5: Qualitative comparison of long-term predictions on the Prometheus fire spread simulation. Each column block compares the Ground Truth (top row of each block), the Baseline model (middle), and the RAP-enhanced model (bottom). The baseline models (UNet Baseline, CNO Baseline) exhibit significant smoothing artifacts and fail to capture the complex, sharp flame fronts characteristic of turbulent combustion. In contrast, our RAP-enhanced models (UNet+ RAP, CNO+ RAP) successfully suppress numerical dissipation and restore high-fidelity, physically plausible details that closely match the Ground Truth.