Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance
Kuan-Chih Huang, Yi-Hsuan Tsai, Ming-Hsuan Yang
TL;DR
This work tackles the high annotation cost of 3D object detection by proposing VG-W3D, a multi-level visual guidance framework that learns a 3D detector from 2D annotations alone. It integrates three visual cues—feature-level objectness alignment, output-level 2D–3D box overlap via a $\text{GIoU}$-based loss, and training-level image-guided pseudo-label refinement—alongside a frustum-based proposal generator and a frozen 2D detector. The method achieves competitive results on KITTI without any 3D labels, outperforming several weakly supervised baselines and rivaling methods that require hundreds of 3D annotations, while leveraging off-the-shelf 2D detectors. This approach significantly reduces annotation effort for 3D perception and offers a scalable pathway for integrating 2D visual signals into 3D understanding, with code to be released publicly.
Abstract
Weakly supervised 3D object detection aims to learn a 3D detector with lower annotation cost, e.g., 2D labels. Unlike prior work which still relies on few accurate 3D annotations, we propose a framework to study how to leverage constraints between 2D and 3D domains without requiring any 3D labels. Specifically, we employ visual data from three perspectives to establish connections between 2D and 3D domains. First, we design a feature-level constraint to align LiDAR and image features based on object-aware regions. Second, the output-level constraint is developed to enforce the overlap between 2D and projected 3D box estimations. Finally, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data. We conduct extensive experiments on the KITTI dataset to validate the effectiveness of the proposed three constraints. Without using any 3D labels, our method achieves favorable performance against state-of-the-art approaches and is competitive with the method that uses 500-frame 3D annotations. Code will be made publicly available at https://github.com/kuanchihhuang/VG-W3D.
