Table of Contents
Fetching ...

STEMFold: Stochastic Temporal Manifold for Multi-Agent Interactions in the Presence of Hidden Agents

Hemant Kumawat, Biswadeep Chakraborty, Saibal Mukhopadhyay

TL;DR

STEMFold tackles the challenge of predicting multi-agent trajectories when many agents are unobserved, by learning a stochastic temporal manifold via a dynamic spatiotemporal graph and a neural ODE-based latent dynamical model. The method encodes observations into a temporal graph with time anchors, propagates latent states through time, and decodes predictions for visible agents, all while accounting for uncertainty due to hidden agents. Theoretical results based on Fisher information and CRLB show that temporal multisets provide richer information than static graphs, supporting improved estimation and predictions. Empirically, STEMFold outperforms strong baselines on simulated and real-world datasets under varying visibility, sensor failures, and noise, demonstrating robustness and adaptability for complex, partially observable multi-agent systems.

Abstract

Learning accurate, data-driven predictive models for multiple interacting agents following unknown dynamics is crucial in many real-world physical and social systems. In many scenarios, dynamics prediction must be performed under incomplete observations, i.e., only a subset of agents are known and observable from a larger topological system while the behaviors of the unobserved agents and their interactions with the observed agents are not known. When only incomplete observations of a dynamical system are available, so that some states remain hidden, it is generally not possible to learn a closed-form model in these variables using either analytic or data-driven techniques. In this work, we propose STEMFold, a spatiotemporal attention-based generative model, to learn a stochastic manifold to predict the underlying unmeasured dynamics of the multi-agent system from observations of only visible agents. Our analytical results motivate STEMFold design using a spatiotemporal graph with time anchors to effectively map the observations of visible agents to a stochastic manifold with no prior information about interaction graph topology. We empirically evaluated our method on two simulations and two real-world datasets, where it outperformed existing networks in predicting complex multiagent interactions, even with many unobserved agents.

STEMFold: Stochastic Temporal Manifold for Multi-Agent Interactions in the Presence of Hidden Agents

TL;DR

STEMFold tackles the challenge of predicting multi-agent trajectories when many agents are unobserved, by learning a stochastic temporal manifold via a dynamic spatiotemporal graph and a neural ODE-based latent dynamical model. The method encodes observations into a temporal graph with time anchors, propagates latent states through time, and decodes predictions for visible agents, all while accounting for uncertainty due to hidden agents. Theoretical results based on Fisher information and CRLB show that temporal multisets provide richer information than static graphs, supporting improved estimation and predictions. Empirically, STEMFold outperforms strong baselines on simulated and real-world datasets under varying visibility, sensor failures, and noise, demonstrating robustness and adaptability for complex, partially observable multi-agent systems.

Abstract

Learning accurate, data-driven predictive models for multiple interacting agents following unknown dynamics is crucial in many real-world physical and social systems. In many scenarios, dynamics prediction must be performed under incomplete observations, i.e., only a subset of agents are known and observable from a larger topological system while the behaviors of the unobserved agents and their interactions with the observed agents are not known. When only incomplete observations of a dynamical system are available, so that some states remain hidden, it is generally not possible to learn a closed-form model in these variables using either analytic or data-driven techniques. In this work, we propose STEMFold, a spatiotemporal attention-based generative model, to learn a stochastic manifold to predict the underlying unmeasured dynamics of the multi-agent system from observations of only visible agents. Our analytical results motivate STEMFold design using a spatiotemporal graph with time anchors to effectively map the observations of visible agents to a stochastic manifold with no prior information about interaction graph topology. We empirically evaluated our method on two simulations and two real-world datasets, where it outperformed existing networks in predicting complex multiagent interactions, even with many unobserved agents.
Paper Structure (26 sections, 2 theorems, 31 equations, 25 figures, 8 tables)

This paper contains 26 sections, 2 theorems, 31 equations, 25 figures, 8 tables.

Key Result

theorem 1

The Fisher information of the embedding of the multiset $X_i$ is greater than the Fisher information of the embedding of each individual element $x_i(t)$ i.e., $det(J(\phi) > det(I(\theta))$

Figures (25)

  • Figure 1: A) Problem landscape in prior works. 'a' & 'b' depict problems addressed in previous works, while 'c' illustrates the unique problem tackled in our work. B.) Model overview. Firstly, the encoder computes the initial latent states for edges and nodes based on the observed sequence of agent observations and adjacency matrix sequence. This computation occurs in two steps: Step 1 involves attention-based representation learning over the dynamic spatiotemporal graph. Step 2 focuses on sequence attention, to learn posterior over the initial latent state. Afterward, the neural ODE framework propagates the latent state through time, and subsequently, the decoder generates predicted observations for the agents.
  • Figure 2: Illustration of the spatiotemporal attention layer in action: On the left side, there's a spatiotemporal graph with each node having an associated time series. In the center(b), you can observe how this layer functions to update the target representation. Finally, the module is passed through the self-attention layer to get the initial latent distribution.
  • Figure 3: Basketball and CMU Mocap Dataset
  • Figure 4: Visualizations depicting predictive trajectories for a system with 10 agents with 75% hidden agents. Dotted lines represent predicted trajectories, while solid lines represent observed trajectories.
  • Figure 5: MSE Error values vs time for spring system with 75% unobservable agents.
  • ...and 20 more figures

Theorems & Definitions (2)

  • theorem 1
  • theorem 2