Table of Contents
Fetching ...

Predicting Future Spatiotemporal Occupancy Grids with Semantics for Autonomous Driving

Maneekwan Toyungyernsub, Esen Yel, Jiachen Li, Mykel J. Kochenderfer

TL;DR

The paper tackles the challenge of predicting future scenes for autonomous driving by integrating environment semantics into occupancy grid forecasting. It introduces a two-module framework: an upstream SMGM predictor that forecasts semantic grids and a downstream occupancy predictor (based on a modified PredNet) that uses these semantic cues to generate future OGMs via an evidential occupancy representation with DST-based updates. The approach is validated on Waymo Open Dataset v1.4.0, showing higher accuracy and better preservation of moving objects over 1.5 s horizons than strong baselines like PredNet and a dynamics-aware Double-Prong model. The results highlight the practical value of incorporating semantic context for proactive trajectory planning and safer navigation, with future work aiming to jointly predict semantics and occupancy to reduce model size.

Abstract

For autonomous vehicles to proactively plan safe trajectories and make informed decisions, they must be able to predict the future occupancy states of the local environment. However, common issues with occupancy prediction include predictions where moving objects vanish or become blurred, particularly at longer time horizons. We propose an environment prediction framework that incorporates environment semantics for future occupancy prediction. Our method first semantically segments the environment and uses this information along with the occupancy information to predict the spatiotemporal evolution of the environment. We validate our approach on the real-world Waymo Open Dataset. Compared to baseline methods, our model has higher prediction accuracy and is capable of maintaining moving object appearances in the predictions for longer prediction time horizons.

Predicting Future Spatiotemporal Occupancy Grids with Semantics for Autonomous Driving

TL;DR

The paper tackles the challenge of predicting future scenes for autonomous driving by integrating environment semantics into occupancy grid forecasting. It introduces a two-module framework: an upstream SMGM predictor that forecasts semantic grids and a downstream occupancy predictor (based on a modified PredNet) that uses these semantic cues to generate future OGMs via an evidential occupancy representation with DST-based updates. The approach is validated on Waymo Open Dataset v1.4.0, showing higher accuracy and better preservation of moving objects over 1.5 s horizons than strong baselines like PredNet and a dynamics-aware Double-Prong model. The results highlight the practical value of incorporating semantic context for proactive trajectory planning and safer navigation, with future work aiming to jointly predict semantics and occupancy to reduce model size.

Abstract

For autonomous vehicles to proactively plan safe trajectories and make informed decisions, they must be able to predict the future occupancy states of the local environment. However, common issues with occupancy prediction include predictions where moving objects vanish or become blurred, particularly at longer time horizons. We propose an environment prediction framework that incorporates environment semantics for future occupancy prediction. Our method first semantically segments the environment and uses this information along with the occupancy information to predict the spatiotemporal evolution of the environment. We validate our approach on the real-world Waymo Open Dataset. Compared to baseline methods, our model has higher prediction accuracy and is capable of maintaining moving object appearances in the predictions for longer prediction time horizons.
Paper Structure (19 sections, 1 equation, 5 figures, 1 table)

This paper contains 19 sections, 1 equation, 5 figures, 1 table.

Figures (5)

  • Figure 1: Future occupancy states are predicted from the past occupancy and semantic inputs.
  • Figure 2: The environment prediction model consists of the SMGM and OGM prediction modules. The predicted SMGM output is subsequently used as an input to the representation layer of the occupancy prediction module to predict future occupancy states. The occupancy prediction module is based on the PredNet PredNet architecture and is modified to incorporate semantic information. The figure shows the next-frame prediction ($t+1$). For multiple-frame prediction, the model treats both the previous OGM and SMGM predictions as inputs and recursively iterates to make next-frame predictions.
  • Figure 3: The whole pipeline for our proposed methodology. Semantic grid maps (SMGMs) are generated from the semantic annotations predicted from the semantic segmentation module. The environment prediction model outputs the future OGM predictions from the past OGM and SMGM inputs.
  • Figure 4: Example driving scenarios and their OGM predictions (occupied: red, unknown: green, empty: blue). The predicted OGMs are shown at selected prediction time steps of 0.1s, 0.5s, 1.0s, and 1.5s. \ref{['fig:qualitative_straight']} is a scenario with multiple vehicles traveling straight. \ref{['fig:qualitative_turning']} is a scenario with a vehicle making a right turn. Compared to baseline methods, our model maintains the predicted moving object appearances for longer prediction time horizons in both example scenarios.
  • Figure 5: The MSE, IS, and dynamic MSE metrics on OGM predictions are evaluated at each time step in the prediction horizon. We note that the cell-wise standard error is too small to be visible in the MSE and the dynamic MSE plots. Lower is better.