DualBEV: Unifying Dual View Transformation with Probabilistic Correspondences
Peidong Li, Wancheng Shen, Qihao Huang, Dixiao Cui
TL;DR
DualBEV addresses BEV perception by unifying dual view transformations (3D‑to‑2D and 2D‑to‑3D) through a probabilistic framework that estimates correspondences via BEV, projection, and image probabilities. It introduces HeightTrans for CNN‑based 3D‑to‑2D VT and Prob‑LSS to strengthen LSS‑style 2D‑to‑3D VT, fused in one stage by the Dual Feature Fusion module to produce robust BEV features with BEV probability guidance. The approach achieves state‑of‑the‑art performance on nuScenes without Transformer, reporting 55.2% mAP and 63.4% NDS on the test set, while maintaining near real‑time efficiency through precomputation. Extensive ablations validate the contributions of probabilistic measurements, Prob‑Sampling, multi‑height sampling, Prob‑LSS, and the DFF fusion design, with qualitative visualizations confirming improved detection across ranges. Limitations include reliance on single‑frame depth signals and the absence of a temporal module, suggesting future work to integrate temporal context and extend to BEV segmentation or 3D occupancy tasks.
Abstract
Camera-based Bird's-Eye-View (BEV) perception often struggles between adopting 3D-to-2D or 2D-to-3D view transformation (VT). The 3D-to-2D VT typically employs resource-intensive Transformer to establish robust correspondences between 3D and 2D features, while the 2D-to-3D VT utilizes the Lift-Splat-Shoot (LSS) pipeline for real-time application, potentially missing distant information. To address these limitations, we propose DualBEV, a unified framework that utilizes a shared feature transformation incorporating three probabilistic measurements for both strategies. By considering dual-view correspondences in one stage, DualBEV effectively bridges the gap between these strategies, harnessing their individual strengths. Our method achieves state-of-the-art performance without Transformer, delivering comparable efficiency to the LSS approach, with 55.2% mAP and 63.4% NDS on the nuScenes test set. Code is available at \url{https://github.com/PeidongLi/DualBEV}
