Table of Contents
Fetching ...

LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked Entities

Florian Sestak, Artur Toshev, Andreas Fürst, Günter Klambauer, Andreas Mayr, Johannes Brandstetter

TL;DR

LaM-SLidE introduces a latent-space framework for spatial dynamical systems that preserves entity traceability through assignable IDs and a fixed-size latent representation. An encoder maps $N$ entities with $(\mathbf{x},\mathbf{m},\mathbf{u})$ to a latent set of tokens, while a decoder retrieves $\mathbf{X}$ and $\mathbf{M}$ using ID-driven cross-attention; a flow-based approximator then evolves the latent state in time. Across pedestrian, basketball, N-body, and molecular dynamics datasets, LaM-SLidE achieves competitive or superior ADE/FDE and domain-relevant metrics (ADE, FDE, JSD, TICA, MSM) with up to an order-of-magnitude reduction in function evaluations. The method scales with model size and latent capacity, demonstrates robust ID assignment, and requires relatively modest inductive biases, indicating strong potential as a foundational approach for diverse dynamical systems. The work provides comprehensive ablations on identifier pool size, assignment, and latent utilization, and includes detailed experimental and architectural analyses.

Abstract

Generative models are spearheading recent progress in deep learning, showcasing strong promise for trajectory sampling in dynamical systems as well. However, whereas latent space modeling paradigms have transformed image and video generation, similar approaches are more difficult for most dynamical systems. Such systems -- from chemical molecule structures to collective human behavior -- are described by interactions of entities, making them inherently linked to connectivity patterns, entity conservation, and the traceability of entities over time. Our approach, LaM-SLidE (Latent Space Modeling of Spatial Dynamical Systems via Linked Entities), bridges the gap between: (1) keeping the traceability of individual entities in a latent system representation, and (2) leveraging the efficiency and scalability of recent advances in image and video generation, where pre-trained encoder and decoder enable generative modeling directly in latent space. The core idea of LaM-SLidE is the introduction of identifier representations (IDs) that enable the retrieval of entity properties and entity composition from latent system representations, thus fostering traceability. Experimentally, across different domains, we show that LaM-SLidE performs favorably in terms of speed, accuracy, and generalizability. Code is available at https://github.com/ml-jku/LaM-SLidE .

LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked Entities

TL;DR

LaM-SLidE introduces a latent-space framework for spatial dynamical systems that preserves entity traceability through assignable IDs and a fixed-size latent representation. An encoder maps entities with to a latent set of tokens, while a decoder retrieves and using ID-driven cross-attention; a flow-based approximator then evolves the latent state in time. Across pedestrian, basketball, N-body, and molecular dynamics datasets, LaM-SLidE achieves competitive or superior ADE/FDE and domain-relevant metrics (ADE, FDE, JSD, TICA, MSM) with up to an order-of-magnitude reduction in function evaluations. The method scales with model size and latent capacity, demonstrates robust ID assignment, and requires relatively modest inductive biases, indicating strong potential as a foundational approach for diverse dynamical systems. The work provides comprehensive ablations on identifier pool size, assignment, and latent utilization, and includes detailed experimental and architectural analyses.

Abstract

Generative models are spearheading recent progress in deep learning, showcasing strong promise for trajectory sampling in dynamical systems as well. However, whereas latent space modeling paradigms have transformed image and video generation, similar approaches are more difficult for most dynamical systems. Such systems -- from chemical molecule structures to collective human behavior -- are described by interactions of entities, making them inherently linked to connectivity patterns, entity conservation, and the traceability of entities over time. Our approach, LaM-SLidE (Latent Space Modeling of Spatial Dynamical Systems via Linked Entities), bridges the gap between: (1) keeping the traceability of individual entities in a latent system representation, and (2) leveraging the efficiency and scalability of recent advances in image and video generation, where pre-trained encoder and decoder enable generative modeling directly in latent space. The core idea of LaM-SLidE is the introduction of identifier representations (IDs) that enable the retrieval of entity properties and entity composition from latent system representations, thus fostering traceability. Experimentally, across different domains, we show that LaM-SLidE performs favorably in terms of speed, accuracy, and generalizability. Code is available at https://github.com/ml-jku/LaM-SLidE .

Paper Structure

This paper contains 99 sections, 4 theorems, 18 equations, 17 figures, 16 tables, 5 algorithms.

Key Result

Proposition 3.0

Given an identifier pool ${\mathcal{I}}$ and a finite set of entities $E$, an identifier assignment pool $I$ as defined by def:id_pool is non-empty if and only if $|E| \leqslant |{\mathcal{I}}|$.

Figures (17)

  • Figure 1: Overview of our approach. Left: Conventional graph neural networks (GNNs) model time-evolving systems (e.g., molecular dynamics) by representing entities as nodes and iteratively updating node embeddings and positions to capture system dynamics across timesteps. Right: Latent diffusion models employ an encoder-decoder architecture to compress input data into a lower-dimensional latent space where generative modeling is performed. Latent diffusion models, frequently enhanced with conditional information such as text, excel at generative tasks; however, due to their fixed input/output structure, they are not directly adaptable to physical systems with a varying number of entities. Middle: Our proposed approach LaM-SLidE bridges these paradigms by: (1) introducing identifiers that allow traceability of individual entities, and (2) leveraging a latent system representation.
  • Figure 2: Architecture of our encoder-decoder structure (First Stage): Left: The encoder maps $N$ input tokens to a latent system representation by cross-attending to $L$ learned latent query tokens. The decoder reconstructs the input data from the latent representation using the assigned IDs. Right: Structure of the input token, consisting of an ID, spatial information, and features (see also \ref{['fig:aspirin-id']}).
  • Figure 3: Example aspirin: IDs are assigned to the atoms of the molecule.
  • Figure 4: Left: The latent model receives conditioning via known tokens (observed timesteps) and mask tokens (for prediction). This example shows conditioning on one timeframe to predict three future ones. Right: ID-based decoding, where the predicted atom positions are decoded by the assigned IDs.
  • Figure 5: Expanded architectural overview. First Stage: The model is trained to reconstruct the encoded system by querying the latent system representation by IDs. Second Stage: Latent flow-based model is trained to predict multiple masked future timesteps. The predicted system states are decoded by the frozen decoder.
  • ...and 12 more figures

Theorems & Definitions (6)

  • Definition 3.0
  • Definition 3.0
  • Proposition 3.0
  • Proposition 3.0
  • Proposition F.0
  • Proposition F.0