Table of Contents
Fetching ...

Flow-guided Motion Prediction with Semantics and Dynamic Occupancy Grid Maps

Rabbia Asghar, Wenqian Liu, Lukas Rummelhard, Anne Spalanzani, Christian Laugier

TL;DR

This work tackles the problem of multi-step driving-scene prediction by fusing probabilistic Dynamic Occupancy Grid Maps (DOGMs) with semantic information to predict both future semantic grids and scene flow. It introduces a flow-guided, multi-task framework that outputs a sequence of semantic grids and per-cell flow maps, then warps the current semantic grid using the predicted flows to obtain warped future grids. The model employs a conditional variational autoencoder with ConvLSTM/ConvGRU components and a dual decoder to produce present and future predictions, trained with BCE losses for semantics, an L1 flow loss, and a KL regularizer. Evaluated on NuScenes, the approach yields improved prediction accuracy and better retention of dynamic vehicles, highlighting the practical impact of incorporating scene flow for autonomous driving planning in allo-centric coordinates.

Abstract

Accurate prediction of driving scenes is essential for road safety and autonomous driving. Occupancy Grid Maps (OGMs) are commonly employed for scene prediction due to their structured spatial representation, flexibility across sensor modalities and integration of uncertainty. Recent studies have successfully combined OGMs with deep learning methods to predict the evolution of scene and learn complex behaviours. These methods, however, do not consider prediction of flow or velocity vectors in the scene. In this work, we propose a novel multi-task framework that leverages dynamic OGMs and semantic information to predict both future vehicle semantic grids and the future flow of the scene. This incorporation of semantic flow not only offers intermediate scene features but also enables the generation of warped semantic grids. Evaluation on the real-world NuScenes dataset demonstrates improved prediction capabilities and enhanced ability of the model to retain dynamic vehicles within the scene.

Flow-guided Motion Prediction with Semantics and Dynamic Occupancy Grid Maps

TL;DR

This work tackles the problem of multi-step driving-scene prediction by fusing probabilistic Dynamic Occupancy Grid Maps (DOGMs) with semantic information to predict both future semantic grids and scene flow. It introduces a flow-guided, multi-task framework that outputs a sequence of semantic grids and per-cell flow maps, then warps the current semantic grid using the predicted flows to obtain warped future grids. The model employs a conditional variational autoencoder with ConvLSTM/ConvGRU components and a dual decoder to produce present and future predictions, trained with BCE losses for semantics, an L1 flow loss, and a KL regularizer. Evaluated on NuScenes, the approach yields improved prediction accuracy and better retention of dynamic vehicles, highlighting the practical impact of incorporating scene flow for autonomous driving planning in allo-centric coordinates.

Abstract

Accurate prediction of driving scenes is essential for road safety and autonomous driving. Occupancy Grid Maps (OGMs) are commonly employed for scene prediction due to their structured spatial representation, flexibility across sensor modalities and integration of uncertainty. Recent studies have successfully combined OGMs with deep learning methods to predict the evolution of scene and learn complex behaviours. These methods, however, do not consider prediction of flow or velocity vectors in the scene. In this work, we propose a novel multi-task framework that leverages dynamic OGMs and semantic information to predict both future vehicle semantic grids and the future flow of the scene. This incorporation of semantic flow not only offers intermediate scene features but also enables the generation of warped semantic grids. Evaluation on the real-world NuScenes dataset demonstrates improved prediction capabilities and enhanced ability of the model to retain dynamic vehicles within the scene.
Paper Structure (24 sections, 3 equations, 4 figures, 2 tables)

This paper contains 24 sections, 3 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: An overview of our proposed network. Our network utilizes a sequence of DOGMs, occupancy state grids, associated velocity grids, and semantic grids $X_{t-N:t}$ as input to capture the scene evolution. Subsequently, it predicts a sequence of future vehicle semantic grids $\hat{Y}_{t+1:t+P}$ and scene flows $\hat{F}_{t+1:t+P}$.
  • Figure 2: To obtain semantic information, we fuse BEV features encoded from camera images with occupancy state grid. The Red, Green, and Blue channels in the RGB grid denote the unknown state, dynamic and static occupied states respectively. The black areas within the occupancy state grid indicate free space.
  • Figure 3: In $F_{t+1}$, backward flow vectors indicate the motion of vehicles and point to the origin of their respective occupancy in $Y_t$. Red and blue represent dynamic vehicles moving upward and downward within the grid, respectively.
  • Figure 4: Semantic flow and warped occupancy prediction examples are demonstrated on three scenes from the Nuscenes Datatset nuscenes2019, covering an area of 60x60m. The first cloumn displays the DOGM input with semantic labels (in white) at the latest timestep. Flow and warped occupancies are shown for 1.0s and 2.0s. Occupancies are color-coded to visualize the range from 0.0s to 2.0s, transitioning from black to purple. The last column showcases magnified current static part of the scene, on which are displayed agent predictions. The magnified scene correspond to the yellow and red boxes, providing a zoomed-in view of the multimodal predictions overlaid on occupancies in the DOGM.