Table of Contents
Fetching ...

Vectorized Representation Dreamer (VRD): Dreaming-Assisted Multi-Agent Motion-Forecasting

Hunter Schofield, Hamidreza Mirkhani, Mohammed Elmahgiubi, Kasra Rezaee, Jinjun Shan

TL;DR

VD is introduced, a vectorized world model-inspired approach to the multi-agent motion forecasting problem that combines a traditional open-loop training regime with a novel dreamed closed-loop training pipeline that leverages a kinematic reconstruction task to imagine the trajectory of all agents, conditioned on the action of the ego vehicle.

Abstract

For an autonomous vehicle to plan a path in its environment, it must be able to accurately forecast the trajectory of all dynamic objects in its proximity. While many traditional methods encode observations in the scene to solve this problem, there are few approaches that consider the effect of the ego vehicle's behavior on the future state of the world. In this paper, we introduce VRD, a vectorized world model-inspired approach to the multi-agent motion forecasting problem. Our method combines a traditional open-loop training regime with a novel dreamed closed-loop training pipeline that leverages a kinematic reconstruction task to imagine the trajectory of all agents, conditioned on the action of the ego vehicle. Quantitative and qualitative experiments are conducted on the Argoverse 2 multi-world forecasting evaluation dataset and the intersection drone (inD) dataset to demonstrate the performance of our proposed model. Our model achieves state-of-the-art performance on the single prediction miss rate metric on the Argoverse 2 dataset and performs on par with the leading models for the single prediction displacement metrics.

Vectorized Representation Dreamer (VRD): Dreaming-Assisted Multi-Agent Motion-Forecasting

TL;DR

VD is introduced, a vectorized world model-inspired approach to the multi-agent motion forecasting problem that combines a traditional open-loop training regime with a novel dreamed closed-loop training pipeline that leverages a kinematic reconstruction task to imagine the trajectory of all agents, conditioned on the action of the ego vehicle.

Abstract

For an autonomous vehicle to plan a path in its environment, it must be able to accurately forecast the trajectory of all dynamic objects in its proximity. While many traditional methods encode observations in the scene to solve this problem, there are few approaches that consider the effect of the ego vehicle's behavior on the future state of the world. In this paper, we introduce VRD, a vectorized world model-inspired approach to the multi-agent motion forecasting problem. Our method combines a traditional open-loop training regime with a novel dreamed closed-loop training pipeline that leverages a kinematic reconstruction task to imagine the trajectory of all agents, conditioned on the action of the ego vehicle. Quantitative and qualitative experiments are conducted on the Argoverse 2 multi-world forecasting evaluation dataset and the intersection drone (inD) dataset to demonstrate the performance of our proposed model. Our model achieves state-of-the-art performance on the single prediction miss rate metric on the Argoverse 2 dataset and performs on par with the leading models for the single prediction displacement metrics.
Paper Structure (13 sections, 6 equations, 3 figures, 3 tables)

This paper contains 13 sections, 6 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Illustration of our motion forecasting model. Based on the historical trajectories of all objects, the ego agent plans an initial trajectory. Using this trajectory, the ego dreams the future of all dynamic objects in the environment. Since the historical observation of the red car indicates slow movement, the model can infer that the red car is likely turning instead of going straight.
  • Figure 2: VRD pipeline overview. All map features and objects are processed into a vectorized latent space, $z_t$. Then, a dreamed rollout is produced by passing the trajectory along with the previous latent representation to the RSSM. The transition predictor estimates the next latent representation of the world which is decoded to obtain the kinematic states of all agents. This process is iterated to re-plan a new ego trajectory, closing the dreamed loop.
  • Figure 3: Six seconds of dreamed trajectories on the Argoverse 2 validation dataset. The green car represents the ego position at $t=0$ and the blue cars represent the dynamic objects at $t=0$. The purple line represents the ego's ground truth trajectory. The yellow and orange cars are the dreamed reconstructions of the ego and dynamic objects, respectively.