Unlocking Past Information: Temporal Embeddings in Cooperative Bird's Eye View Prediction
Dominik Rößle, Jeremias Gerner, Klaus Bogenberger, Daniel Cremers, Stefanie Schmidtner, Torsten Schön
TL;DR
This work introduces TempCoBEV, an independent temporal module that augments camera-based cooperative BEV map segmentation by incorporating historical embeddings from collaborating vehicles. It uses an importance-guided attention stack with an Importance Fusion module and a Temporal Fusion Module based on deformable cross-attention to fuse current and past BEV embeddings, producing a refined BEV map without retraining the base model. Training efficiency is enhanced by pre-inferring fused BEV embeddings and freezing the decoder, achieving up to 24x faster training. Evaluations on the OPV2V dataset show TempCoBEV improves current-frame IoU by up to 2% and future-frame IoU under communication failures by up to 19%, validating robustness to dropouts and occlusions with historical cues. The approach demonstrates practical impact by enabling more reliable cooperative perception in autonomous driving, especially when network conditions degrade.
Abstract
Accurate and comprehensive semantic segmentation of Bird's Eye View (BEV) is essential for ensuring safe and proactive navigation in autonomous driving. Although cooperative perception has exceeded the detection capabilities of single-agent systems, prevalent camera-based algorithms in cooperative perception neglect valuable information derived from historical observations. This limitation becomes critical during sensor failures or communication issues as cooperative perception reverts to single-agent perception, leading to degraded performance and incomplete BEV segmentation maps. This paper introduces TempCoBEV, a temporal module designed to incorporate historical cues into current observations, thereby improving the quality and reliability of BEV map segmentations. We propose an importance-guided attention architecture to effectively integrate temporal information that prioritizes relevant properties for BEV map segmentation. TempCoBEV is an independent temporal module that seamlessly integrates into state-of-the-art camera-based cooperative perception models. We demonstrate through extensive experiments on the OPV2V dataset that TempCoBEV performs better than non-temporal models in predicting current and future BEV map segmentations, particularly in scenarios involving communication failures. We show the efficacy of TempCoBEV and its capability to integrate historical cues into the current BEV map, improving predictions under optimal communication conditions by up to 2% and under communication failures by up to 19%. The code is available at https://github.com/cvims/TempCoBEV
