Table of Contents
Fetching ...

Spatiotemporal System Forecasting with Irregular Time Steps via Masked Autoencoder

Kewei Zhu, Yanze Xin, Jinwei Hu, Xiaoyuan Cheng, Yiming Yang, Sibo Cheng

Abstract

Predicting high-dimensional dynamical systems with irregular time steps presents significant challenges for current data-driven algorithms. These irregularities arise from missing data, sparse observations, or adaptive computational techniques, reducing prediction accuracy. To address these limitations, we propose a novel method: a Physics-Spatiotemporal Masked Autoencoder. This method integrates convolutional autoencoders for spatial feature extraction with masked autoencoders optimised for irregular time series, leveraging attention mechanisms to reconstruct the entire physical sequence in a single prediction pass. The model avoids the need for data imputation while preserving physical integrity of the system. Here, 'physics' refers to high-dimensional fields generated by underlying dynamical systems, rather than the enforcement of explicit physical constraints or PDE residuals. We evaluate this approach on multiple simulated datasets and real-world ocean temperature data. The results demonstrate that our method achieves significant improvements in prediction accuracy, robustness to nonlinearities, and computational efficiency over traditional convolutional and recurrent network methods. The model shows potential for capturing complex spatiotemporal patterns without requiring domain-specific knowledge, with applications in climate modelling, fluid dynamics, ocean forecasting, environmental monitoring, and scientific computing.

Spatiotemporal System Forecasting with Irregular Time Steps via Masked Autoencoder

Abstract

Predicting high-dimensional dynamical systems with irregular time steps presents significant challenges for current data-driven algorithms. These irregularities arise from missing data, sparse observations, or adaptive computational techniques, reducing prediction accuracy. To address these limitations, we propose a novel method: a Physics-Spatiotemporal Masked Autoencoder. This method integrates convolutional autoencoders for spatial feature extraction with masked autoencoders optimised for irregular time series, leveraging attention mechanisms to reconstruct the entire physical sequence in a single prediction pass. The model avoids the need for data imputation while preserving physical integrity of the system. Here, 'physics' refers to high-dimensional fields generated by underlying dynamical systems, rather than the enforcement of explicit physical constraints or PDE residuals. We evaluate this approach on multiple simulated datasets and real-world ocean temperature data. The results demonstrate that our method achieves significant improvements in prediction accuracy, robustness to nonlinearities, and computational efficiency over traditional convolutional and recurrent network methods. The model shows potential for capturing complex spatiotemporal patterns without requiring domain-specific knowledge, with applications in climate modelling, fluid dynamics, ocean forecasting, environmental monitoring, and scientific computing.

Paper Structure

This paper contains 41 sections, 17 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: Comparison of Sequence-to-Sequence(Seq2Seq) prediction methods in dynamical systems of irregular time steps. Left: Traditional RNN-based models feature step-wise rolling out with necessary data imputation for handling missing steps, which may introduce biases and cumulative errors. Right: Our model performs element-wise predictions in the latent space by adaptive attention mechanism to reconstruct the complete sequence in a single pass.
  • Figure 2: Architecture of the proposed P-STMAE framework. (a) Convolutional encoder compresses physical states into latent representations. Positional encodings are added, and a masked transformer captures temporal dependencies in latent space. (b) Learnable masking tokens are padded at missing and future time steps. Transformer blocks process the sequence, and the convolutional decoder reconstructs the complete physical fields. (c) Each transformer block consists of layer normalization, multi-head self-attention, and a feedforward network. Self-attention operates only on observed latent states.
  • Figure 3: Ground truth (top) and error maps of P-STMAE, ConvRAE, and ConvLSTM for forecasting the variable $u$ in the shallow water dataset with a sampling dilation of 3. Columns represent successive forecasting steps. Among the models, P-STMAE yields the smallest errors, indicating predictive accuracy.
  • Figure 4: Robustness analysis of P-STMAE on the shallow water dataset. (a) Performance comparison under varying numbers of missing steps in the input sequence with a length of 10. Each model is trained and evaluated with missing steps ranging from 1 to 6. P-STMAE demonstrates consistent performance and robustness, while the RNN-based models, especially ConvLSTM, show higher sensitivity to increasing missing steps. (b) Test performance comparison regarding the sampling dilations of data sequences. All models are separately trained on the shallow water dataset of different dilations.
  • Figure 5: Ground truth (top) and error maps of P-STMAE, ConvRAE, and ConvLSTM for forecasting the variable $u$ in the diffusion-reaction dataset with a sampling dilation of 5. Columns represent successive forecasting steps. The results show that P-STMAE consistently achieves lower errors than the baselines, confirming its advantage.
  • ...and 8 more figures