Table of Contents
Fetching ...

Enhanced Spatiotemporal Prediction Using Physical-guided And Frequency-enhanced Recurrent Neural Networks

Xuanle Zhao, Yue Sun, Tielin Zhang, Bo Xu

TL;DR

This work addresses the challenge of accurate spatiotemporal prediction under data constraints by introducing a physical-guided neural network that combines a frequency-enhanced Fourier pathway, a moment loss, and an adaptive PDE-guided Runge-Kutta updater. The model fuses Transformer-based spatial corrections with Fourier-based physical representations, and updates latent states with an adaptive RK2 scheme to enforce PDE-consistent dynamics. Across diverse spatiotemporal and video benchmarks, it achieves state-of-the-art or competitive performance while using significantly fewer parameters, demonstrating the practical value of physics-informed design in dynamic forecasting.

Abstract

Spatiotemporal prediction plays an important role in solving natural problems and processing video frames, especially in weather forecasting and human action recognition. Recent advances attempt to incorporate prior physical knowledge into the deep learning framework to estimate the unknown governing partial differential equations (PDEs), which have shown promising results in spatiotemporal prediction tasks. However, previous approaches only restrict neural network architectures or loss functions to acquire physical or PDE features, which decreases the representative capacity of a neural network. Meanwhile, the updating process of the physical state cannot be effectively estimated. To solve the above mentioned problems, this paper proposes a physical-guided neural network, which utilizes the frequency-enhanced Fourier module and moment loss to strengthen the model's ability to estimate the spatiotemporal dynamics. Furthermore, we propose an adaptive second-order Runge-Kutta method with physical constraints to model the physical states more precisely. We evaluate our model on both spatiotemporal and video prediction tasks. The experimental results show that our model outperforms state-of-the-art methods and performs best in several datasets, with a much smaller parameter count.

Enhanced Spatiotemporal Prediction Using Physical-guided And Frequency-enhanced Recurrent Neural Networks

TL;DR

This work addresses the challenge of accurate spatiotemporal prediction under data constraints by introducing a physical-guided neural network that combines a frequency-enhanced Fourier pathway, a moment loss, and an adaptive PDE-guided Runge-Kutta updater. The model fuses Transformer-based spatial corrections with Fourier-based physical representations, and updates latent states with an adaptive RK2 scheme to enforce PDE-consistent dynamics. Across diverse spatiotemporal and video benchmarks, it achieves state-of-the-art or competitive performance while using significantly fewer parameters, demonstrating the practical value of physics-informed design in dynamic forecasting.

Abstract

Spatiotemporal prediction plays an important role in solving natural problems and processing video frames, especially in weather forecasting and human action recognition. Recent advances attempt to incorporate prior physical knowledge into the deep learning framework to estimate the unknown governing partial differential equations (PDEs), which have shown promising results in spatiotemporal prediction tasks. However, previous approaches only restrict neural network architectures or loss functions to acquire physical or PDE features, which decreases the representative capacity of a neural network. Meanwhile, the updating process of the physical state cannot be effectively estimated. To solve the above mentioned problems, this paper proposes a physical-guided neural network, which utilizes the frequency-enhanced Fourier module and moment loss to strengthen the model's ability to estimate the spatiotemporal dynamics. Furthermore, we propose an adaptive second-order Runge-Kutta method with physical constraints to model the physical states more precisely. We evaluate our model on both spatiotemporal and video prediction tasks. The experimental results show that our model outperforms state-of-the-art methods and performs best in several datasets, with a much smaller parameter count.
Paper Structure (17 sections, 19 equations, 9 figures, 6 tables, 1 algorithm)

This paper contains 17 sections, 19 equations, 9 figures, 6 tables, 1 algorithm.

Figures (9)

  • Figure 1: The overall network architecture. The input frame is first embedded into patches. Then the previous hidden state and patches are concatenated and processed in TCM and FRM in parallel to output $u_t^{TCM}$ and $u_t^{F}$. Finally, the representations are added and utilized in the ARKM for updating.
  • Figure 2: The detailed structure of the Swin Transformer Block.
  • Figure 3: The detailed structure of the Fourier-based residual block. In each block, the Fourier layer and MLP are computed serially.
  • Figure 4: The detailed structure of the adaptive second-order runge-kutta module. The input feature first computes derivatives utilizing the same differential approximator $\text{F}$ and then updates with the gate mechanism.
  • Figure 5: Example of prediction results for the KTH dataset. Top: 1-10 time steps input sequence; Middle: 11-30 time steps ground truth sequence; Bottom: 11-30 time steps prediction sequence.
  • ...and 4 more figures