PC-BEV: An Efficient Polar-Cartesian BEV Fusion Framework for LiDAR Semantic Segmentation
Shoumeng Qiu, Xinrun Li, XiangYang Xue, Jian Pu
TL;DR
This work tackles the inefficiency of cross-view fusion for LiDAR semantic segmentation by proposing PC-BEV, a Polar-Cartesian BEV fusion framework that operates entirely in BEV space using fixed correspondences between polar and Cartesian partitions. A remap-based fusion method enables dense, memory-efficient feature interaction, yielding up to a 170× speedup over point-based methods while preserving contextual information. A Transformer-CNN Mixture Architecture provides global scene understanding plus local refinement for BEV features, delivering strong accuracy with real-time inference on SemanticKITTI and nuScenes. The combination of BEV-only fusion and efficient remapping demonstrates that multiview fusion can be realized without expensive point-wise interactions, offering practical benefits for autonomous driving systems. Code is available at the provided GitHub URL.
Abstract
Although multiview fusion has demonstrated potential in LiDAR segmentation, its dependence on computationally intensive point-based interactions, arising from the lack of fixed correspondences between views such as range view and Bird's-Eye View (BEV), hinders its practical deployment. This paper challenges the prevailing notion that multiview fusion is essential for achieving high performance. We demonstrate that significant gains can be realized by directly fusing Polar and Cartesian partitioning strategies within the BEV space. Our proposed BEV-only segmentation model leverages the inherent fixed grid correspondences between these partitioning schemes, enabling a fusion process that is orders of magnitude faster (170$\times$ speedup) than conventional point-based methods. Furthermore, our approach facilitates dense feature fusion, preserving richer contextual information compared to sparse point-based alternatives. To enhance scene understanding while maintaining inference efficiency, we also introduce a hybrid Transformer-CNN architecture. Extensive evaluation on the SemanticKITTI and nuScenes datasets provides compelling evidence that our method outperforms previous multiview fusion approaches in terms of both performance and inference speed, highlighting the potential of BEV-based fusion for LiDAR segmentation. Code is available at \url{https://github.com/skyshoumeng/PC-BEV.}
