LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked Entities
Florian Sestak, Artur Toshev, Andreas Fürst, Günter Klambauer, Andreas Mayr, Johannes Brandstetter
TL;DR
LaM-SLidE introduces a latent-space framework for spatial dynamical systems that preserves entity traceability through assignable IDs and a fixed-size latent representation. An encoder maps $N$ entities with $(\mathbf{x},\mathbf{m},\mathbf{u})$ to a latent set of tokens, while a decoder retrieves $\mathbf{X}$ and $\mathbf{M}$ using ID-driven cross-attention; a flow-based approximator then evolves the latent state in time. Across pedestrian, basketball, N-body, and molecular dynamics datasets, LaM-SLidE achieves competitive or superior ADE/FDE and domain-relevant metrics (ADE, FDE, JSD, TICA, MSM) with up to an order-of-magnitude reduction in function evaluations. The method scales with model size and latent capacity, demonstrates robust ID assignment, and requires relatively modest inductive biases, indicating strong potential as a foundational approach for diverse dynamical systems. The work provides comprehensive ablations on identifier pool size, assignment, and latent utilization, and includes detailed experimental and architectural analyses.
Abstract
Generative models are spearheading recent progress in deep learning, showcasing strong promise for trajectory sampling in dynamical systems as well. However, whereas latent space modeling paradigms have transformed image and video generation, similar approaches are more difficult for most dynamical systems. Such systems -- from chemical molecule structures to collective human behavior -- are described by interactions of entities, making them inherently linked to connectivity patterns, entity conservation, and the traceability of entities over time. Our approach, LaM-SLidE (Latent Space Modeling of Spatial Dynamical Systems via Linked Entities), bridges the gap between: (1) keeping the traceability of individual entities in a latent system representation, and (2) leveraging the efficiency and scalability of recent advances in image and video generation, where pre-trained encoder and decoder enable generative modeling directly in latent space. The core idea of LaM-SLidE is the introduction of identifier representations (IDs) that enable the retrieval of entity properties and entity composition from latent system representations, thus fostering traceability. Experimentally, across different domains, we show that LaM-SLidE performs favorably in terms of speed, accuracy, and generalizability. Code is available at https://github.com/ml-jku/LaM-SLidE .
