Table of Contents
Fetching ...

DiffuBox: Refining 3D Object Detection with Point Diffusion

Xiangyu Chen, Zhenzhen Liu, Katie Z Luo, Siddhartha Datta, Adhitya Polavaram, Yan Wang, Yurong You, Boyi Li, Marco Pavone, Wei-Lun Chao, Mark Campbell, Bharath Hariharan, Kilian Q. Weinberger

TL;DR

This work tackles domain shift in 3D object detection by introducing DiffuBox, a diffusion-based post-processing module that refines noisy 7-DoF bounding boxes using LiDAR points local to each proposal. It leverages a normalized box view to achieve size-invariant localization and learns a point-cloud diffusion model that denoises box-relative point distributions, enabling domain-robust refinement without re-training on the target domain. The approach yields substantial improvements (e.g., up to 24 mAP in cross-domain tests) across multiple detectors and object classes, particularly at near-range, and can be extended to detector retraining with refined labels. The method offers a detector-agnostic, scalable solution to reduce domain gaps in autonomous driving perception, with potential for broader label refinement and sensor fusion applications.

Abstract

Ensuring robust 3D object detection and localization is crucial for many applications in robotics and autonomous driving. Recent models, however, face difficulties in maintaining high performance when applied to domains with differing sensor setups or geographic locations, often resulting in poor localization accuracy due to domain shift. To overcome this challenge, we introduce a novel diffusion-based box refinement approach. This method employs a domain-agnostic diffusion model, conditioned on the LiDAR points surrounding a coarse bounding box, to simultaneously refine the box's location, size, and orientation. We evaluate this approach under various domain adaptation settings, and our results reveal significant improvements across different datasets, object classes and detectors. Our PyTorch implementation is available at \href{https://github.com/cxy1997/DiffuBox}{https://github.com/cxy1997/DiffuBox}.

DiffuBox: Refining 3D Object Detection with Point Diffusion

TL;DR

This work tackles domain shift in 3D object detection by introducing DiffuBox, a diffusion-based post-processing module that refines noisy 7-DoF bounding boxes using LiDAR points local to each proposal. It leverages a normalized box view to achieve size-invariant localization and learns a point-cloud diffusion model that denoises box-relative point distributions, enabling domain-robust refinement without re-training on the target domain. The approach yields substantial improvements (e.g., up to 24 mAP in cross-domain tests) across multiple detectors and object classes, particularly at near-range, and can be extended to detector retraining with refined labels. The method offers a detector-agnostic, scalable solution to reduce domain gaps in autonomous driving perception, with potential for broader label refinement and sensor fusion applications.

Abstract

Ensuring robust 3D object detection and localization is crucial for many applications in robotics and autonomous driving. Recent models, however, face difficulties in maintaining high performance when applied to domains with differing sensor setups or geographic locations, often resulting in poor localization accuracy due to domain shift. To overcome this challenge, we introduce a novel diffusion-based box refinement approach. This method employs a domain-agnostic diffusion model, conditioned on the LiDAR points surrounding a coarse bounding box, to simultaneously refine the box's location, size, and orientation. We evaluate this approach under various domain adaptation settings, and our results reveal significant improvements across different datasets, object classes and detectors. Our PyTorch implementation is available at \href{https://github.com/cxy1997/DiffuBox}{https://github.com/cxy1997/DiffuBox}.
Paper Structure (41 sections, 9 equations, 10 figures, 14 tables, 2 algorithms)

This paper contains 41 sections, 9 equations, 10 figures, 14 tables, 2 algorithms.

Figures (10)

  • Figure 1: Box refinement through denoising steps. We visualize the correction of a noisy prediction, shown in yellow, using DiffuBox. The ground truth box is visualized in green for reference. Boxes being refined are colored blue based on timestep. The output is refined iteratively though the denoising steps, resulting in the final, corrected output of our method.
  • Figure 2: Example Car objects converted into normalized box view (NBV). Foreground/background points are marked in black/gray, respectively for better visualization. Foreground LiDAR points distributing tightly within a ${\left[-1, 1\right]}^3$ NBV cube is a domain-consistent sign for good localization.
  • Figure 3: Illustration of 3D object detection on Lyft/Ithaca365 before and after DiffuBox's refinement. We visualize detections from an out-of-domain PointRCNN on four scenes from each dataset. We color the ground truth boxes in green, the detector outputs in yellow, and DiffuBox's refinements in blue. The out-of-domain detector sometimes produces false positives or boxes with incorrect shape or alignment. DiffuBox effectively improves the wrong or inaccurate boxes, while making little change to the accurate boxes.
  • Figure 4: Comparison of bounding box quality before and after refinement with DiffuBox. We report the distribution of Intersection over Union (IoU) with ground-truth labels from the Lyft dataset. The unrefined predictions are from an unadapted Point-RCNN model trained on KITTI. We show that DiffuBox leads to significant improvement in bounding box localization.
  • Figure 5: mAP vs. Number of Diffusion Steps. We report the BEV (left) and 3D (right) mAP @ IoU 0.7 for the setting of KITTI $\rightarrow$ Lyft Cars and PointRCNN detector.
  • ...and 5 more figures