Table of Contents
Fetching ...

Scene Informer: Anchor-based Occlusion Inference and Trajectory Prediction in Partially Observable Environments

Bernard Lange, Jiachen Li, Mykel J. Kochenderfer

TL;DR

The paper tackles occlusion inference and trajectory prediction in partially observable autonomous driving scenarios. It introduces Scene Informer, an end-to-end transformer-based framework that unifies occlusion reasoning with observed-agent forecasting, using anchors to represent occluded regions and predict occupancy $p_{occ}$ and $K$ multi-step trajectories $Y_{1:K}$ with probabilities $p_{1:K}$. The approach processes vectorized inputs (agent histories, maps) through a scene encoder and uses an anchor-based decoder to produce targeted predictions for both occluded and visible agents, while analyzing how different observability assumptions affect performance. On the Waymo Open Motion Dataset, Scene Informer achieves state-of-the-art results for occupancy prediction and improves trajectory prediction under partial observability, demonstrating robustness and practical impact for safer AV planning.

Abstract

Navigating complex and dynamic environments requires autonomous vehicles (AVs) to reason about both visible and occluded regions. This involves predicting the future motion of observed agents, inferring occluded ones, and modeling their interactions based on vectorized scene representations of the partially observable environment. However, prior work on occlusion inference and trajectory prediction have developed in isolation, with the former based on simplified rasterized methods and the latter assuming full environment observability. We introduce the Scene Informer, a unified approach for predicting both observed agent trajectories and inferring occlusions in a partially observable setting. It uses a transformer to aggregate various input modalities and facilitate selective queries on occlusions that might intersect with the AV's planned path. The framework estimates occupancy probabilities and likely trajectories for occlusions, as well as forecast motion for observed agents. We explore common observability assumptions in both domains and their performance impact. Our approach outperforms existing methods in both occupancy prediction and trajectory prediction in partially observable setting on the Waymo Open Motion Dataset.

Scene Informer: Anchor-based Occlusion Inference and Trajectory Prediction in Partially Observable Environments

TL;DR

The paper tackles occlusion inference and trajectory prediction in partially observable autonomous driving scenarios. It introduces Scene Informer, an end-to-end transformer-based framework that unifies occlusion reasoning with observed-agent forecasting, using anchors to represent occluded regions and predict occupancy and multi-step trajectories with probabilities . The approach processes vectorized inputs (agent histories, maps) through a scene encoder and uses an anchor-based decoder to produce targeted predictions for both occluded and visible agents, while analyzing how different observability assumptions affect performance. On the Waymo Open Motion Dataset, Scene Informer achieves state-of-the-art results for occupancy prediction and improves trajectory prediction under partial observability, demonstrating robustness and practical impact for safer AV planning.

Abstract

Navigating complex and dynamic environments requires autonomous vehicles (AVs) to reason about both visible and occluded regions. This involves predicting the future motion of observed agents, inferring occluded ones, and modeling their interactions based on vectorized scene representations of the partially observable environment. However, prior work on occlusion inference and trajectory prediction have developed in isolation, with the former based on simplified rasterized methods and the latter assuming full environment observability. We introduce the Scene Informer, a unified approach for predicting both observed agent trajectories and inferring occlusions in a partially observable setting. It uses a transformer to aggregate various input modalities and facilitate selective queries on occlusions that might intersect with the AV's planned path. The framework estimates occupancy probabilities and likely trajectories for occlusions, as well as forecast motion for observed agents. We explore common observability assumptions in both domains and their performance impact. Our approach outperforms existing methods in both occupancy prediction and trajectory prediction in partially observable setting on the Waymo Open Motion Dataset.
Paper Structure (11 sections, 4 equations, 5 figures, 1 table)

This paper contains 11 sections, 4 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: We introduce Scene Informer, an end-to-end prediction framework that considers both observed and occluded agents in a partially observable environment. It forecasts multi-modal futures for observed agents and estimates occupancy probabilities and most likely trajectories originating from the occlusion.
  • Figure 2: Scene Informer consists of a scene encoder and anchor decoder. It reasons in terms of anchors (●) that are assigned to each observed agent and randomly populated in the occlusion of interest. Scene encoder aggregates different observation input modalities and creates scene embeddings. Anchor decoder cross-attends between scene embeddings and anchors, and outputs predicted occupancy probability $p_{occ}$ and $K$ most likely future trajectories $Y_{1:K}$ (■) with the probability of each trajectory $p_{1:K}$ for each anchor.
  • Figure 3: Visualization of the dataset. Each sample contains agents ($\blacksquare$) with a history of observations (■) and future trajectory (■) in the frame of the ego vehicle ($\bigstar$). We explore the following variations: (a) Full Observability. (b) Partial Observability with 50% of generating occlusions (■). (c) Limited Observability with all possible occlusions. (d) Full Observability with a single occlusion (used for Scene Informer training). Anchors ($\bullet$) are assigned to the last time step of observed agents and populated in the occlusion. Occluded agents and observations are grey (■).
  • Figure 4: Scene Informer Predictions in Crowded Settings: The ■ visualizes predictions, its intensity reflecting trajectory probability. The ● intensity signifies occupancy probability. The top row displays occupancy, the bottom, forecasted trajectories. The ● denotes trajectories from high-likelihood occupancy anchors. Our method reliably predicts significant occupancy for ground truth occluded objects, delivering realistic trajectories for all agents. In scenes 1-3, our model accurately gauges occluded agent positions and their future paths. In scene 4, with the ego ($\bigstar$) nearing a crosswalk with stationary vehicles, our approach anticipates a crossing pedestrian.
  • Figure 5: Impact of observed agents' histories on the occlusion inference performance. We modify the observed trajectories of two agents (● and ▼). In Scene 1, we visualize a scenario where a ▼ is stationary and a ● is moving forward. For an anchor ●, our approach predicts occupancy with low probability and vertical motion from the top to the bottom of the intersection. In Scene 2, we visualize the predictions for a modified scenario. The ▼ is approaching quickly the intersection and the ● is stationary. Scene Informer realistically adapts its occupancy and trajectory predictions based on the observed motion of other agents. It is now highly likely that the anchor in the middle of the occlusion is occupied, and a majority of predicted trajectories are horizontal from the left to right of the intersection.