Towards Streaming LiDAR Object Detection with Point Clouds as Egocentric Sequences
Mellon M. Zhang, Glen Chou, Saibal Mukhopadhyay
TL;DR
This work addresses the latency-accuracy trade-off in LiDAR-based 3D detection by introducing Polar-Fast-Cartesian-Full (PFCF), a hybrid detector that combines a fast polar streaming backbone with a lightweight Cartesian full-scan backbone through a Sector Feature Buffer. Central to PFCF is Polar Hierarchical Mamba (PHiM), a polar-native state-space backbone that uses dimensionally-decomposed convolutions to mitigate polar distortion while preserving streaming efficiency. The approach achieves a new Pareto frontier on the Waymo Open dataset, surpassing prior streaming methods by about 10% mAP and matching full-scan accuracy at roughly twice the update rate, with strong generalization to nuScenes. The combination of SFB-based cross-sector fusion, PHiM's temporal-spatial modeling, and distortion-aware feature learning enables full-scene predictions on streaming inputs, offering practical benefits for real-time autonomous driving perception.
Abstract
Accurate and low-latency 3D object detection is essential for autonomous driving, where safety hinges on both rapid response and reliable perception. While rotating LiDAR sensors are widely adopted for their robustness and fidelity, current detectors face a trade-off: streaming methods process partial polar sectors on the fly for fast updates but suffer from limited visibility, cross-sector dependencies, and distortions from retrofitted Cartesian designs, whereas full-scan methods achieve higher accuracy but are bottlenecked by the inherent latency of a LiDAR revolution. We propose Polar-Fast-Cartesian-Full (PFCF), a hybrid detector that combines fast polar processing for intra-sector feature extraction with accurate Cartesian reasoning for full-scene understanding. Central to PFCF is a custom Mamba SSM-based streaming backbone with dimensionally-decomposed convolutions that avoids distortion-heavy planes, enabling parameter-efficient, translation-invariant, and distortion-robust polar representation learning. Local sector features are extracted via this backbone, then accumulated into a sector feature buffer to enable efficient inter-sector communication through a full-scan backbone. PFCF establishes a new Pareto frontier on the Waymo Open dataset, surpassing prior streaming baselines by 10% mAP and matching full-scan accuracy at twice the update rate. Code is available at \href{https://github.com/meilongzhang/Polar-Hierarchical-Mamba}{https://github.com/meilongzhang/Polar-Hierarchical-Mamba}.
