Table of Contents
Fetching ...

Goal-Oriented Reactive Simulation for Closed-Loop Trajectory Prediction

Harsh Yadav, Tobias Meisen

Abstract

Current trajectory prediction models are primarily trained in an open-loop manner, which often leads to covariate shift and compounding errors when deployed in real-world, closed-loop settings. Furthermore, relying on static datasets or non-reactive log-replay simulators severs the interactive loop, preventing the ego agent from learning to actively negotiate surrounding traffic. In this work, we propose an on-policy closed-loop training paradigm optimized for high-frequency, receding horizon ego prediction. To ground the ego prediction in a realistic representation of traffic interactions and to achieve reactive consistency, we introduce a goal-oriented, transformer-based scene decoder, resulting in an inherently reactive training simulation. By exposing the ego agent to a mixture of open-loop data and simulated, self-induced states, the model learns recovery behaviors to correct its own execution errors. Extensive evaluation demonstrates that closed-loop training significantly enhances collision avoidance capabilities at high replanning frequencies, yielding relative collision rate reductions of up to 27.0% on nuScenes and 79.5% in dense DeepScenario intersections compared to open-loop baselines. Additionally, we show that a hybrid simulation combining reactive with non-reactive surrounding agents achieves optimal balance between immediate interactivity and long-term behavioral stability.

Goal-Oriented Reactive Simulation for Closed-Loop Trajectory Prediction

Abstract

Current trajectory prediction models are primarily trained in an open-loop manner, which often leads to covariate shift and compounding errors when deployed in real-world, closed-loop settings. Furthermore, relying on static datasets or non-reactive log-replay simulators severs the interactive loop, preventing the ego agent from learning to actively negotiate surrounding traffic. In this work, we propose an on-policy closed-loop training paradigm optimized for high-frequency, receding horizon ego prediction. To ground the ego prediction in a realistic representation of traffic interactions and to achieve reactive consistency, we introduce a goal-oriented, transformer-based scene decoder, resulting in an inherently reactive training simulation. By exposing the ego agent to a mixture of open-loop data and simulated, self-induced states, the model learns recovery behaviors to correct its own execution errors. Extensive evaluation demonstrates that closed-loop training significantly enhances collision avoidance capabilities at high replanning frequencies, yielding relative collision rate reductions of up to 27.0% on nuScenes and 79.5% in dense DeepScenario intersections compared to open-loop baselines. Additionally, we show that a hybrid simulation combining reactive with non-reactive surrounding agents achieves optimal balance between immediate interactivity and long-term behavioral stability.
Paper Structure (24 sections, 5 equations, 4 figures, 13 tables, 2 algorithms)

This paper contains 24 sections, 5 equations, 4 figures, 13 tables, 2 algorithms.

Figures (4)

  • Figure 1: Network design with multimodal ego and joint unimodal scene predictions
  • Figure 2: Rollout analysis across varying replanning frequencies. Granular analysis of collision (Top) and L2 (Bottom) for $t \in [1, T_{pred}] \equiv [0.5\text{s}, 6.0\text{s}]$ on nuScenes.
  • Figure 3: Figure \ref{['fig:nuS_scene_ablate_average_0_1_col']} displays ego agent's collisions with surrounding agents occurring at the beginning of the closed-loop rollouts, while Figure \ref{['fig:nuS_scene_ablate_average_4_6_col']} shows collisions happening in the long-tail of the closed-loop rollouts.
  • Figure 4: We compare scene predictions conditioned solely on the final observed position, $t_{goal}=T_{obs}(\leq T_{pred})$ (Top Row), against our proposed strategy of randomly sampling goal tokens from the full observation window, $t_{goal} \sim \mathcal{U}[1, T_{obs}]$ (Bottom Row). Relying exclusively on the final observed state causes the model to over-fixate on the trajectory up to the goal position, often leading to divergent behavior for $t>T_{obs}$. In contrast, our randomized goal sampling forces the decoder to interpret the goal token as a latent navigational intent rather than a rigid endpoint, resulting in robust, map-compliant predictions that extend smoothly beyond the observation horizon.