Pre-Training LiDAR-Based 3D Object Detectors Through Colorization
Tai-Yu Pan, Chenyang Ma, Tianle Chen, Cheng Perng Phoo, Katie Z Luo, Yurong You, Mark Campbell, Kilian Q. Weinberger, Bharath Hariharan, Wei-Lun Chao
TL;DR
LiDAR detectors require substantial labeled data, limiting scalability. This paper introduces Grounded Point Colorization (GPC), a pre-training approach that teaches the LiDAR backbone to colorize points while grounding colors with seed hints, thereby learning semantically meaningful representations without labels. By using color as contextual grounding and reframing colorization as a classification task with a balanced softmax across $K$ bins, GPC mitigates intrinsic color variation and selection bias, leading to strong data-efficient gains on KITTI and Waymo, and compatible improvements for voxel-based detectors. Across extensive ablations and qualitative analyses, the hint mechanism and color quantization prove crucial, enabling the backbone to infer object-level color consistency and segmentation cues that transfer to improved 3D detection with limited annotations. Overall, GPC offers a simple, effective, and non-contrastive SSL pathway that reduces annotation effort while boosting 3D perception for autonomous driving.
Abstract
Accurate 3D object detection and understanding for self-driving cars heavily relies on LiDAR point clouds, necessitating large amounts of labeled data to train. In this work, we introduce an innovative pre-training approach, Grounded Point Colorization (GPC), to bridge the gap between data and labels by teaching the model to colorize LiDAR point clouds, equipping it with valuable semantic cues. To tackle challenges arising from color variations and selection bias, we incorporate color as "context" by providing ground-truth colors as hints during colorization. Experimental results on the KITTI and Waymo datasets demonstrate GPC's remarkable effectiveness. Even with limited labeled data, GPC significantly improves fine-tuning performance; notably, on just 20% of the KITTI dataset, GPC outperforms training from scratch with the entire dataset. In sum, we introduce a fresh perspective on pre-training for 3D object detection, aligning the objective with the model's intended role and ultimately advancing the accuracy and efficiency of 3D object detection for autonomous vehicles.
