What Matters in Range View 3D Object Detection
Benjamin Wilson, Nicholas Autio Mitchell, Jhony Kaesemodel Pontes, James Hays
TL;DR
This work analyzes range-view 3D object detection and demonstrates that a simple, well-tuned range-view model can achieve state-of-the-art results without the bells and whistles of prior literature. It identifies four key design decisions—input feature dimensionality, 3D input encoding, 3D classification supervision, and range-based subsampling—as the primary levers for performance and runtime. The authors introduce Dynamic 3D Centerness, a Gaussian proximity-based supervision signal, and Range Subsampling to reduce proposals, showing their effectiveness across Argoverse 2 and Waymo Open, with a clear improvement in small-object detection. The resulting model is open-source, multi-class, and competitive with voxel-based methods on Argoverse 2, and establishes a new state-of-the-art among range-view models on Waymo Open, while achieving around 10 Hz. These findings suggest that simple, principled range-view techniques can tightly match or exceed more complex approaches, guiding future range-view research toward efficient, scalable designs with practical impact.
Abstract
Lidar-based perception pipelines rely on 3D object detection models to interpret complex scenes. While multiple representations for lidar exist, the range-view is enticing since it losslessly encodes the entire lidar sensor output. In this work, we achieve state-of-the-art amongst range-view 3D object detection models without using multiple techniques proposed in past range-view literature. We explore range-view 3D object detection across two modern datasets with substantially different properties: Argoverse 2 and Waymo Open. Our investigation reveals key insights: (1) input feature dimensionality significantly influences the overall performance, (2) surprisingly, employing a classification loss grounded in 3D spatial proximity works as well or better compared to more elaborate IoU-based losses, and (3) addressing non-uniform lidar density via a straightforward range subsampling technique outperforms existing multi-resolution, range-conditioned networks. Our experiments reveal that techniques proposed in recent range-view literature are not needed to achieve state-of-the-art performance. Combining the above findings, we establish a new state-of-the-art model for range-view 3D object detection -- improving AP by 2.2% on the Waymo Open dataset while maintaining a runtime of 10 Hz. We establish the first range-view model on the Argoverse 2 dataset and outperform strong voxel-based baselines. All models are multi-class and open-source. Code is available at https://github.com/benjaminrwilson/range-view-3d-detection.
