DiffuBox: Refining 3D Object Detection with Point Diffusion
Xiangyu Chen, Zhenzhen Liu, Katie Z Luo, Siddhartha Datta, Adhitya Polavaram, Yan Wang, Yurong You, Boyi Li, Marco Pavone, Wei-Lun Chao, Mark Campbell, Bharath Hariharan, Kilian Q. Weinberger
TL;DR
This work tackles domain shift in 3D object detection by introducing DiffuBox, a diffusion-based post-processing module that refines noisy 7-DoF bounding boxes using LiDAR points local to each proposal. It leverages a normalized box view to achieve size-invariant localization and learns a point-cloud diffusion model that denoises box-relative point distributions, enabling domain-robust refinement without re-training on the target domain. The approach yields substantial improvements (e.g., up to 24 mAP in cross-domain tests) across multiple detectors and object classes, particularly at near-range, and can be extended to detector retraining with refined labels. The method offers a detector-agnostic, scalable solution to reduce domain gaps in autonomous driving perception, with potential for broader label refinement and sensor fusion applications.
Abstract
Ensuring robust 3D object detection and localization is crucial for many applications in robotics and autonomous driving. Recent models, however, face difficulties in maintaining high performance when applied to domains with differing sensor setups or geographic locations, often resulting in poor localization accuracy due to domain shift. To overcome this challenge, we introduce a novel diffusion-based box refinement approach. This method employs a domain-agnostic diffusion model, conditioned on the LiDAR points surrounding a coarse bounding box, to simultaneously refine the box's location, size, and orientation. We evaluate this approach under various domain adaptation settings, and our results reveal significant improvements across different datasets, object classes and detectors. Our PyTorch implementation is available at \href{https://github.com/cxy1997/DiffuBox}{https://github.com/cxy1997/DiffuBox}.
