Table of Contents
Fetching ...

Unlocking Past Information: Temporal Embeddings in Cooperative Bird's Eye View Prediction

Dominik Rößle, Jeremias Gerner, Klaus Bogenberger, Daniel Cremers, Stefanie Schmidtner, Torsten Schön

TL;DR

This work introduces TempCoBEV, an independent temporal module that augments camera-based cooperative BEV map segmentation by incorporating historical embeddings from collaborating vehicles. It uses an importance-guided attention stack with an Importance Fusion module and a Temporal Fusion Module based on deformable cross-attention to fuse current and past BEV embeddings, producing a refined BEV map without retraining the base model. Training efficiency is enhanced by pre-inferring fused BEV embeddings and freezing the decoder, achieving up to 24x faster training. Evaluations on the OPV2V dataset show TempCoBEV improves current-frame IoU by up to 2% and future-frame IoU under communication failures by up to 19%, validating robustness to dropouts and occlusions with historical cues. The approach demonstrates practical impact by enabling more reliable cooperative perception in autonomous driving, especially when network conditions degrade.

Abstract

Accurate and comprehensive semantic segmentation of Bird's Eye View (BEV) is essential for ensuring safe and proactive navigation in autonomous driving. Although cooperative perception has exceeded the detection capabilities of single-agent systems, prevalent camera-based algorithms in cooperative perception neglect valuable information derived from historical observations. This limitation becomes critical during sensor failures or communication issues as cooperative perception reverts to single-agent perception, leading to degraded performance and incomplete BEV segmentation maps. This paper introduces TempCoBEV, a temporal module designed to incorporate historical cues into current observations, thereby improving the quality and reliability of BEV map segmentations. We propose an importance-guided attention architecture to effectively integrate temporal information that prioritizes relevant properties for BEV map segmentation. TempCoBEV is an independent temporal module that seamlessly integrates into state-of-the-art camera-based cooperative perception models. We demonstrate through extensive experiments on the OPV2V dataset that TempCoBEV performs better than non-temporal models in predicting current and future BEV map segmentations, particularly in scenarios involving communication failures. We show the efficacy of TempCoBEV and its capability to integrate historical cues into the current BEV map, improving predictions under optimal communication conditions by up to 2% and under communication failures by up to 19%. The code is available at https://github.com/cvims/TempCoBEV

Unlocking Past Information: Temporal Embeddings in Cooperative Bird's Eye View Prediction

TL;DR

This work introduces TempCoBEV, an independent temporal module that augments camera-based cooperative BEV map segmentation by incorporating historical embeddings from collaborating vehicles. It uses an importance-guided attention stack with an Importance Fusion module and a Temporal Fusion Module based on deformable cross-attention to fuse current and past BEV embeddings, producing a refined BEV map without retraining the base model. Training efficiency is enhanced by pre-inferring fused BEV embeddings and freezing the decoder, achieving up to 24x faster training. Evaluations on the OPV2V dataset show TempCoBEV improves current-frame IoU by up to 2% and future-frame IoU under communication failures by up to 19%, validating robustness to dropouts and occlusions with historical cues. The approach demonstrates practical impact by enabling more reliable cooperative perception in autonomous driving, especially when network conditions degrade.

Abstract

Accurate and comprehensive semantic segmentation of Bird's Eye View (BEV) is essential for ensuring safe and proactive navigation in autonomous driving. Although cooperative perception has exceeded the detection capabilities of single-agent systems, prevalent camera-based algorithms in cooperative perception neglect valuable information derived from historical observations. This limitation becomes critical during sensor failures or communication issues as cooperative perception reverts to single-agent perception, leading to degraded performance and incomplete BEV segmentation maps. This paper introduces TempCoBEV, a temporal module designed to incorporate historical cues into current observations, thereby improving the quality and reliability of BEV map segmentations. We propose an importance-guided attention architecture to effectively integrate temporal information that prioritizes relevant properties for BEV map segmentation. TempCoBEV is an independent temporal module that seamlessly integrates into state-of-the-art camera-based cooperative perception models. We demonstrate through extensive experiments on the OPV2V dataset that TempCoBEV performs better than non-temporal models in predicting current and future BEV map segmentations, particularly in scenarios involving communication failures. We show the efficacy of TempCoBEV and its capability to integrate historical cues into the current BEV map, improving predictions under optimal communication conditions by up to 2% and under communication failures by up to 19%. The code is available at https://github.com/cvims/TempCoBEV
Paper Structure (16 sections, 9 equations, 5 figures, 2 tables)

This paper contains 16 sections, 9 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Illustration of the TempCoBEV integration. Different vehicles are shown in different colors. At each timestamp, a varying number of CAVs engage in information sharing. In the illustration of the BEV embeddings, the ego vehicle is shown in blue and is always centered; undetectable vehicles are represented in red, while detectable ones are shown in black. TempCoBEV incorporates current and historical processed embeddings, fusing them into a unified representation before feeding them into the decoder. The resulting output depicts the potential reconstruction of vehicles in green, leveraging historical cues.
  • Figure 2: Architecture of the Importance Fusion module to predict importance maps of embeddings and synthesizes information with relative importance.
  • Figure 3: Architecture of TempCoBEV. The question mark and gray embedding refer to the unknown future BEV embedding. TempCoBEV uses historical embeddings, the importance fusion module, and the temporal fusion module to build the embedding for a historical information-integrated BEV prediction.
  • Figure 4: Comparison of IoU for different models over time. Solid lines indicate the extension with TempCoBEV. Dashed lines indicate the default model. The dashed green and orange lines overlap heavily.
  • Figure 5: Exemplary output visualization with communication failures (from $t+1$). The first row shows CoBEVT outputs. The second row shows CoBEVT paired with historical cues from TempCoBEV. Green markups are true positives. Red markups refer to false negatives. Orange markups are false positives.