Table of Contents
Fetching ...

When do World Models Successfully Learn Dynamical Systems?

Edmund Ross, Claudia Drygala, Leonhard Schwarz, Samir Kaiser, Francesca di Mare, Tobias Breiten, Hanno Gottschalk

TL;DR

The paper tackles learning dynamical systems governed by PDEs from tokenized observations using World Models. It formalizes a system-theoretic framework with a tokenization $h:\mathcal{X}\to\mathbb{R}^m$, latent state $y=h(x)$, reconstruction $G:\mathcal{Y}\to\mathcal{X}$, and autoregressive update $g:\mathcal{Y}^k\to\mathcal{Y}$, and demonstrates that observability guarantees the existence of $G$ and $g$ along with a latent propagator $S_\Delta$ in the latent space. The authors prove PAC-like guarantees for learning $g$ and $G$ and validate the approach on heat, wave, KS, and cylinder-flow data, showing low prediction error and strong temporal coherence while achieving substantial speedups over LES. Compared to neural-operator baselines (e.g., FNO, DeepONet), the world-model framework offers improved long-horizon stability and efficiency, particularly for in-distribution and out-of-distribution test data. The work provides a principled link between observability and learnability for world models and demonstrates practical utility in turbulent-flow synthesis and CFD surrogacy.

Abstract

In this work, we explore the use of compact latent representations with learned time dynamics ('World Models') to simulate physical systems. Drawing on concepts from control theory, we propose a theoretical framework that explains why projecting time slices into a low-dimensional space and then concatenating to form a history ('Tokenization') is so effective at learning physics datasets, and characterise when exactly the underlying dynamics admit a reconstruction mapping from the history of previous tokenized frames to the next. To validate these claims, we develop a sequence of models with increasing complexity, starting with least-squares regression and progressing through simple linear layers, shallow adversarial learners, and ultimately full-scale generative adversarial networks (GANs). We evaluate these models on a variety of datasets, including modified forms of the heat and wave equations, the chaotic regime 2D Kuramoto-Sivashinsky equation, and a challenging computational fluid dynamics (CFD) dataset of a 2D Kármán vortex street around a fixed cylinder, where our model is successfully able to recreate the flow.

When do World Models Successfully Learn Dynamical Systems?

TL;DR

The paper tackles learning dynamical systems governed by PDEs from tokenized observations using World Models. It formalizes a system-theoretic framework with a tokenization , latent state , reconstruction , and autoregressive update , and demonstrates that observability guarantees the existence of and along with a latent propagator in the latent space. The authors prove PAC-like guarantees for learning and and validate the approach on heat, wave, KS, and cylinder-flow data, showing low prediction error and strong temporal coherence while achieving substantial speedups over LES. Compared to neural-operator baselines (e.g., FNO, DeepONet), the world-model framework offers improved long-horizon stability and efficiency, particularly for in-distribution and out-of-distribution test data. The work provides a principled link between observability and learnability for world models and demonstrates practical utility in turbulent-flow synthesis and CFD surrogacy.

Abstract

In this work, we explore the use of compact latent representations with learned time dynamics ('World Models') to simulate physical systems. Drawing on concepts from control theory, we propose a theoretical framework that explains why projecting time slices into a low-dimensional space and then concatenating to form a history ('Tokenization') is so effective at learning physics datasets, and characterise when exactly the underlying dynamics admit a reconstruction mapping from the history of previous tokenized frames to the next. To validate these claims, we develop a sequence of models with increasing complexity, starting with least-squares regression and progressing through simple linear layers, shallow adversarial learners, and ultimately full-scale generative adversarial networks (GANs). We evaluate these models on a variety of datasets, including modified forms of the heat and wave equations, the chaotic regime 2D Kuramoto-Sivashinsky equation, and a challenging computational fluid dynamics (CFD) dataset of a 2D Kármán vortex street around a fixed cylinder, where our model is successfully able to recreate the flow.

Paper Structure

This paper contains 55 sections, 6 theorems, 50 equations, 15 figures, 8 tables.

Key Result

Lemma 2

Suppose the dynamical system is observable in the sense of Definition def:Observability (i). Then the autoregressive dynamics defined in Definition def:Observability (ii) exists.

Figures (15)

  • Figure 1: The full‐state time‐series data $x_{t-k:t}=(x(t-k),\ldots,x(t))$ is tokenized via $y_{\mathrm{seq}}(t)=h(x_{t-k:t})$, yielding a low‐dimensional representation. The next observation is predicted by $y(t+1)=g(y_{\mathrm{seq}}(t))$, then appended—dropping the oldest entry— to form $y_{\mathrm{seq}}(t+1)$, and finally the full state is reconstructed via $x(t+1)=G(y_{\mathrm{seq}}(t+1))$.
  • Figure 2: Average $L_2$ residues of our low-res heat equation model against epoch, for the test set. Each x-axis unit represents 2000 epochs. Data is normalised between $[-1, 1]$. The unmodified (non-observable) heat equation has history size $16$.
  • Figure 3: Full model $L_2$ error results, for the heat and wave equations. Each low-res model is fed the first 16 frames from the test set, and is left to generate the next 100 using its own output. The result is then fed to the super-res model.
  • Figure 4: (Left) Residues of our KSE model against epoch, for the low-res test set. For data normalised between $[-1, 1]$, We obtain a minimum average $L_2$ loss of $4 \times 10^{-6}$ and minimum average $L_1$ loss of $0.0014$.
  • Figure 6: $\Delta t$ vs $\rho(\Delta t)$ with $\Delta t \in {1, \dots, 100}$ for pixel 1 (left), and pixel 3 (right). The shaded areas show a $1\sigma$ range. The mean and standard deviation of the correlation are calculated across $50$ generated videos containing $301$ frames, and for the real videos were across $100$ real videos with $301$ frames.
  • ...and 10 more figures

Theorems & Definitions (12)

  • Definition 1
  • Lemma 2
  • proof
  • Theorem 3
  • proof
  • Theorem 4: PAC-learning of $g$ and $G$
  • Theorem 5: Autoregressive PAC-Learning of $S_\Delta$
  • Theorem 6
  • proof : Proof of Theorem \ref{['thm:PAC_gG']}
  • proof : Proof of Theorem \ref{['thm:autoregPAC']}
  • ...and 2 more