HGNET: A Hierarchical Feature Guided Network for Occupancy Flow Field Prediction
Zhan Chen, Chen Tang, Lu Xiong
TL;DR
The paper tackles joint occupancy and flow field prediction for multi-agent traffic, addressing occlusions and temporal dependencies. It introduces HGNET, a transformer-based encoder that fuses vectorized histories, visual occupancy grids, and map features, and a hierarchical decoder incorporating the Feature-Guided Attention (FGAT) module and a Time Series Memory framework to model cross-target interactions and temporal dynamics. The approach optimizes occupancy with a focal+cross-entropy loss and flow with smooth L1, combined as $L = \frac{1}{h w T}(100 L_{occ} + L_f)$, and demonstrates competitive performance on the Waymo occupancy and flow benchmark, with ablations confirming the benefits of FGAT and temporal memory. The work advances autonomous driving prediction by enabling tighter coupling of flow and occupancy predictions across time and agents, improving robustness in complex, interactive scenarios.
Abstract
Predicting the motion of multiple traffic participants has always been one of the most challenging tasks in autonomous driving. The recently proposed occupancy flow field prediction method has shown to be a more effective and scalable representation compared to general trajectory prediction methods. However, in complex multi-agent traffic scenarios, it remains difficult to model the interactions among various factors and the dependencies among prediction outputs at different time steps. In view of this, we propose a transformer-based hierarchical feature guided network (HGNET), which can efficiently extract features of agents and map information from visual and vectorized inputs, modeling multimodal interaction relationships. Second, we design the Feature-Guided Attention (FGAT) module to leverage the potential guiding effects between different prediction targets, thereby improving prediction accuracy. Additionally, to enhance the temporal consistency and causal relationships of the predictions, we propose a Time Series Memory framework to learn the conditional distribution models of the prediction outputs at future time steps from multivariate time series. The results demonstrate that our model exhibits competitive performance, which ranks 3rd in the 2024 Waymo Occupancy and Flow Prediction Challenge.
