Table of Contents
Fetching ...

Pixel State Value Network for Combined Prediction and Planning in Interactive Environments

Sascha Rosbach, Stefan M. Leupold, Simon Großjohann, Stefan Roth

TL;DR

This work proposes a deep learning methodology to combine prediction and planning, where a conditional GAN with the U-Net architecture is trained to predict two high-resolution image sequences that represent explicit motion predictions.

Abstract

Automated vehicles operating in urban environments have to reliably interact with other traffic participants. Planning algorithms often utilize separate prediction modules forecasting probabilistic, multi-modal, and interactive behaviors of objects. Designing prediction and planning as two separate modules introduces significant challenges, particularly due to the interdependence of these modules. This work proposes a deep learning methodology to combine prediction and planning. A conditional GAN with the U-Net architecture is trained to predict two high-resolution image sequences. The sequences represent explicit motion predictions, mainly used to train context understanding, and pixel state values suitable for planning encoding kinematic reachability, object dynamics, safety, and driving comfort. The model can be trained offline on target images rendered by a sampling-based model-predictive planner, leveraging real-world driving data. Our results demonstrate intuitive behavior in complex situations, such as lane changes amidst conflicting objectives.

Pixel State Value Network for Combined Prediction and Planning in Interactive Environments

TL;DR

This work proposes a deep learning methodology to combine prediction and planning, where a conditional GAN with the U-Net architecture is trained to predict two high-resolution image sequences that represent explicit motion predictions.

Abstract

Automated vehicles operating in urban environments have to reliably interact with other traffic participants. Planning algorithms often utilize separate prediction modules forecasting probabilistic, multi-modal, and interactive behaviors of objects. Designing prediction and planning as two separate modules introduces significant challenges, particularly due to the interdependence of these modules. This work proposes a deep learning methodology to combine prediction and planning. A conditional GAN with the U-Net architecture is trained to predict two high-resolution image sequences. The sequences represent explicit motion predictions, mainly used to train context understanding, and pixel state values suitable for planning encoding kinematic reachability, object dynamics, safety, and driving comfort. The model can be trained offline on target images rendered by a sampling-based model-predictive planner, leveraging real-world driving data. Our results demonstrate intuitive behavior in complex situations, such as lane changes amidst conflicting objectives.
Paper Structure (13 sections, 3 equations, 5 figures)

This paper contains 13 sections, 3 equations, 5 figures.

Figures (5)

  • Figure 1: Shows the proposed architecture comprised of pixel state value network (PSVN), planning, and rendering module. The rendering module uses policy sets and object predictions to generate targets for offline training of the PSVN. The PSVN infers pixel state values for the planning module. Object predictions and the rendering module (dashed lines) are not required during inference.
  • Figure 2: Depictions of the input representation for the neural network: (a) Velocities of objects and speed limits of centerlines. (b) Directions of objects and centerlines. (c) Accelerations of objects and target lane. (d) Static objects and boundaries.
  • Figure 3: Displays targets (a,c,e,g) and predictions (b,d,f,h) for straights, roundabout and merge situations. Motion predictions of objects are displayed in blue color. State values are depicted in red, yellow, and green colors. Multiple overlays are displayed in grey (boundaries, target lane, ground-truth objects). The trajectories passing through time layers are drawn in black. The trajectories in (a,c,e,g) are selected based on the predicted state value images and in (b,d,f,h) are based on the trajectory values.
  • Figure 4: Confusion matrices contrasting optimal policy of planner using the reward function and PSVN for 4732 test scenarios that each contain objects.
  • Figure :