Table of Contents
Fetching ...

FACTS: A Factored State-Space Framework For World Modelling

Li Nanbo, Firas Laakom, Yucheng Xu, Wenyi Wang, Jürgen Schmidhuber

TL;DR

FACTS proposes a permutable, graph-structured state-space memory with a memory-input routing mechanism to achieve permutation-invariant spatial-temporal world modelling. By representing memory as a set of latent factors and inputs as nodes, and using attention-based routing plus a linearisation trick around an initial memory Z_0, FACTS enables efficient long-horizon sequence modelling with parallelisable updates. Theoretical guarantees show left-permutation equivariance and right-permutation invariance, while empirical results demonstrate competitive or superior performance across multivariate time-series forecasting, object-centric world modelling, and dynamic graph prediction, including robustness to input permutation. This framework offers a general, scalable approach to robust world modelling with efficient history compression and strong cross-domain applicability.

Abstract

World modelling is essential for understanding and predicting the dynamics of complex systems by learning both spatial and temporal dependencies. However, current frameworks, such as Transformers and selective state-space models like Mambas, exhibit limitations in efficiently encoding spatial and temporal structures, particularly in scenarios requiring long-term high-dimensional sequence modelling. To address these issues, we propose a novel recurrent framework, the \textbf{FACT}ored \textbf{S}tate-space (\textbf{FACTS}) model, for spatial-temporal world modelling. The FACTS framework constructs a graph-structured memory with a routing mechanism that learns permutable memory representations, ensuring invariance to input permutations while adapting through selective state-space propagation. Furthermore, FACTS supports parallel computation of high-dimensional sequences. We empirically evaluate FACTS across diverse tasks, including multivariate time series forecasting, object-centric world modelling, and spatial-temporal graph prediction, demonstrating that it consistently outperforms or matches specialised state-of-the-art models, despite its general-purpose world modelling design.

FACTS: A Factored State-Space Framework For World Modelling

TL;DR

FACTS proposes a permutable, graph-structured state-space memory with a memory-input routing mechanism to achieve permutation-invariant spatial-temporal world modelling. By representing memory as a set of latent factors and inputs as nodes, and using attention-based routing plus a linearisation trick around an initial memory Z_0, FACTS enables efficient long-horizon sequence modelling with parallelisable updates. Theoretical guarantees show left-permutation equivariance and right-permutation invariance, while empirical results demonstrate competitive or superior performance across multivariate time-series forecasting, object-centric world modelling, and dynamic graph prediction, including robustness to input permutation. This framework offers a general, scalable approach to robust world modelling with efficient history compression and strong cross-domain applicability.

Abstract

World modelling is essential for understanding and predicting the dynamics of complex systems by learning both spatial and temporal dependencies. However, current frameworks, such as Transformers and selective state-space models like Mambas, exhibit limitations in efficiently encoding spatial and temporal structures, particularly in scenarios requiring long-term high-dimensional sequence modelling. To address these issues, we propose a novel recurrent framework, the \textbf{FACT}ored \textbf{S}tate-space (\textbf{FACTS}) model, for spatial-temporal world modelling. The FACTS framework constructs a graph-structured memory with a routing mechanism that learns permutable memory representations, ensuring invariance to input permutations while adapting through selective state-space propagation. Furthermore, FACTS supports parallel computation of high-dimensional sequences. We empirically evaluate FACTS across diverse tasks, including multivariate time series forecasting, object-centric world modelling, and spatial-temporal graph prediction, demonstrating that it consistently outperforms or matches specialised state-of-the-art models, despite its general-purpose world modelling design.

Paper Structure

This paper contains 21 sections, 4 theorems, 16 equations, 8 figures, 9 tables, 2 algorithms.

Key Result

Theorem 1

$\mathop{\mathrm{\mathbf{FACTS}}}\nolimits$ as defined in equation eq:pfacts_t is L.P.E. and R.P.I.

Figures (8)

  • Figure 1: Overview of FACTS Architecture. The FACTS framework constructs a factored state-space memory, allowing for flexible representations (e.g. graphs and sets). Sequential inputs (e.g. $X_t$) are processed through a selective memory-input interaction mechanism (denoted by the circular icon $\lcirclearrowright$), which determines how the inputs interact with and update factored memory. The different coloured pathways represent distinct latent factors, whose dynamics evolve over time based on these interactions. The design ensures that the memory update is permutation-invariant with respect to the input features, enabling FACTS to capture and track meaningful algorithmic regularities for accurate future predictions.
  • Figure 2: Model robustness to input permutations on 4 MTSF datasets (left to right: Electricity, Traffic, ETTm1, SolarEnergy). Magenta bars represent original performance, salmon bars show performance using our/TSLib implementation, and yellow bars represent results under input permutation. Results are averaged over five random seeds, with error bars showing $\pm 2\times$ standard deviation.
  • Figure 3: Parallel vs Recurrent FACTS on long-term MTS (Electricity): MSE and MAE for different window/chunk sizes. Input (observable) sequence length set to 96.
  • Figure 4: RMSE, MAE, and MAPE of various approaches for long-term graph node forecasting (1-hour, 12-step ahead) on the METR-LA dataset.
  • Figure 4: Video reconstruction quality vs number of slots. With video reconstruction quality measure by LPIPS$\downarrow$.
  • ...and 3 more figures

Theorems & Definitions (8)

  • Definition 1
  • Definition 2
  • Theorem 1
  • Theorem 2
  • Theorem : Restatement of Theorem \ref{['therm:facts']}
  • proof
  • Theorem : Restatement of Theorem \ref{['therm:generalised']}
  • proof