Table of Contents
Fetching ...

Physics-Informed Neural Controlled Differential Equations for Scalable Long Horizon Multi-Agent Motion Forecasting

Shounak Sural, Charles Kekeh, Wenliang Liu, Federico Pecora, Mouhacine Benosman

TL;DR

This work tackles long-horizon, multi-agent motion forecasting for fleets of robots by framing dynamics in continuous time with neural controlled differential equations (NCDEs). PINCoDE learns a joint latent representation via an autoencoder and propagates it through a latent NCDE conditioned on future goal velocities, incorporating physics-informed losses to enforce dynamic feasibility. The approach demonstrates strong accuracy (ADE below 0.5 m for 1 minute) and scales from 10 to 100 robots without adding parameters, aided by curriculum learning that yields a 2.7× improvement over analytical baselines for 4-minute horizons. The framework yields a differentiable surrogate simulator suitable for planning and policy learning, with practical runtime sufficiency and clear avenues for extending to include pedestrians and other uncontrollable agents.

Abstract

Long-horizon motion forecasting for multiple autonomous robots is challenging due to non-linear agent interactions, compounding prediction errors, and continuous-time evolution of dynamics. Learned dynamics of such a system can be useful in various applications such as travel time prediction, prediction-guided planning and generative simulation. In this work, we aim to develop an efficient trajectory forecasting model conditioned on multi-agent goals. Motivated by the recent success of physics-guided deep learning for partially known dynamical systems, we develop a model based on neural Controlled Differential Equations (CDEs) for long-horizon motion forecasting. Unlike discrete-time methods such as RNNs and transformers, neural CDEs operate in continuous time, allowing us to combine physics-informed constraints and biases to jointly model multi-robot dynamics. Our approach, named PINCoDE (Physics-Informed Neural Controlled Differential Equations), learns differential equation parameters that can be used to predict the trajectories of a multi-agent system starting from an initial condition. PINCoDE is conditioned on future goals and enforces physics constraints for robot motion over extended periods of time. We adopt a strategy that scales our model from 10 robots to 100 robots without the need for additional model parameters, while producing predictions with an average ADE below 0.5 m for a 1-minute horizon. Furthermore, progressive training with curriculum learning for our PINCoDE model results in a 2.7X reduction of forecasted pose error over 4 minute horizons compared to analytical models.

Physics-Informed Neural Controlled Differential Equations for Scalable Long Horizon Multi-Agent Motion Forecasting

TL;DR

This work tackles long-horizon, multi-agent motion forecasting for fleets of robots by framing dynamics in continuous time with neural controlled differential equations (NCDEs). PINCoDE learns a joint latent representation via an autoencoder and propagates it through a latent NCDE conditioned on future goal velocities, incorporating physics-informed losses to enforce dynamic feasibility. The approach demonstrates strong accuracy (ADE below 0.5 m for 1 minute) and scales from 10 to 100 robots without adding parameters, aided by curriculum learning that yields a 2.7× improvement over analytical baselines for 4-minute horizons. The framework yields a differentiable surrogate simulator suitable for planning and policy learning, with practical runtime sufficiency and clear avenues for extending to include pedestrians and other uncontrollable agents.

Abstract

Long-horizon motion forecasting for multiple autonomous robots is challenging due to non-linear agent interactions, compounding prediction errors, and continuous-time evolution of dynamics. Learned dynamics of such a system can be useful in various applications such as travel time prediction, prediction-guided planning and generative simulation. In this work, we aim to develop an efficient trajectory forecasting model conditioned on multi-agent goals. Motivated by the recent success of physics-guided deep learning for partially known dynamical systems, we develop a model based on neural Controlled Differential Equations (CDEs) for long-horizon motion forecasting. Unlike discrete-time methods such as RNNs and transformers, neural CDEs operate in continuous time, allowing us to combine physics-informed constraints and biases to jointly model multi-robot dynamics. Our approach, named PINCoDE (Physics-Informed Neural Controlled Differential Equations), learns differential equation parameters that can be used to predict the trajectories of a multi-agent system starting from an initial condition. PINCoDE is conditioned on future goals and enforces physics constraints for robot motion over extended periods of time. We adopt a strategy that scales our model from 10 robots to 100 robots without the need for additional model parameters, while producing predictions with an average ADE below 0.5 m for a 1-minute horizon. Furthermore, progressive training with curriculum learning for our PINCoDE model results in a 2.7X reduction of forecasted pose error over 4 minute horizons compared to analytical models.

Paper Structure

This paper contains 22 sections, 14 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Motion of 4 robots across 8 seconds with interaction zones marked. Within two pose recordings at 1Hz, there can be significant motion and associated interaction of the robots, which are not captured well with discrete-time models, but can be captured more explicitly with continuous time dynamics modeling with PINCoDE. In practice, our model produces much longer goal conditioned forecasts that can span 1-4 minutes encompassing many more interactions between a larger number of agents.
  • Figure 2: Architecture of our model that uses an autoencoder followed by a latent neural CDE which is additionally guided by reference controls for motion forecasting. $S_t$ and $Z_t$ represent the multi-agent state and latent state at time $t$, $D_L$ refers to the dimension of the learned latent representation and $D_C$ refers to the dimension of the reference controls. FCN refers to a Fully Connected Network and $C$ represents a smoothened version of the control path obtained after taking a cumulative sum of raw reference controls $c_t$.
  • Figure 3: Three instances of motion forecasting performance over a 60 second time horizon. Blue arrows show ground truth and red arrows show prediction, with strong correspondences observed over long time horizons.
  • Figure 4: A few instances of control-conditioned motion forecasting where the unicycle model (cyan) heavily diverges compared to the ground truth (blue) while predictions from our PINCoDE model (red) trained with curriculum learning remain close to the ground truth in a 4 minute horizon.
  • Figure 5: PINCoDE with curriculum learning for progressively training across longer horizons results in a significantly lower error as compared to analytical models like the unicycle.
  • ...and 2 more figures