Table of Contents
Fetching ...

Pre-Training LiDAR-Based 3D Object Detectors Through Colorization

Tai-Yu Pan, Chenyang Ma, Tianle Chen, Cheng Perng Phoo, Katie Z Luo, Yurong You, Mark Campbell, Kilian Q. Weinberger, Bharath Hariharan, Wei-Lun Chao

TL;DR

LiDAR detectors require substantial labeled data, limiting scalability. This paper introduces Grounded Point Colorization (GPC), a pre-training approach that teaches the LiDAR backbone to colorize points while grounding colors with seed hints, thereby learning semantically meaningful representations without labels. By using color as contextual grounding and reframing colorization as a classification task with a balanced softmax across $K$ bins, GPC mitigates intrinsic color variation and selection bias, leading to strong data-efficient gains on KITTI and Waymo, and compatible improvements for voxel-based detectors. Across extensive ablations and qualitative analyses, the hint mechanism and color quantization prove crucial, enabling the backbone to infer object-level color consistency and segmentation cues that transfer to improved 3D detection with limited annotations. Overall, GPC offers a simple, effective, and non-contrastive SSL pathway that reduces annotation effort while boosting 3D perception for autonomous driving.

Abstract

Accurate 3D object detection and understanding for self-driving cars heavily relies on LiDAR point clouds, necessitating large amounts of labeled data to train. In this work, we introduce an innovative pre-training approach, Grounded Point Colorization (GPC), to bridge the gap between data and labels by teaching the model to colorize LiDAR point clouds, equipping it with valuable semantic cues. To tackle challenges arising from color variations and selection bias, we incorporate color as "context" by providing ground-truth colors as hints during colorization. Experimental results on the KITTI and Waymo datasets demonstrate GPC's remarkable effectiveness. Even with limited labeled data, GPC significantly improves fine-tuning performance; notably, on just 20% of the KITTI dataset, GPC outperforms training from scratch with the entire dataset. In sum, we introduce a fresh perspective on pre-training for 3D object detection, aligning the objective with the model's intended role and ultimately advancing the accuracy and efficiency of 3D object detection for autonomous vehicles.

Pre-Training LiDAR-Based 3D Object Detectors Through Colorization

TL;DR

LiDAR detectors require substantial labeled data, limiting scalability. This paper introduces Grounded Point Colorization (GPC), a pre-training approach that teaches the LiDAR backbone to colorize points while grounding colors with seed hints, thereby learning semantically meaningful representations without labels. By using color as contextual grounding and reframing colorization as a classification task with a balanced softmax across bins, GPC mitigates intrinsic color variation and selection bias, leading to strong data-efficient gains on KITTI and Waymo, and compatible improvements for voxel-based detectors. Across extensive ablations and qualitative analyses, the hint mechanism and color quantization prove crucial, enabling the backbone to infer object-level color consistency and segmentation cues that transfer to improved 3D detection with limited annotations. Overall, GPC offers a simple, effective, and non-contrastive SSL pathway that reduces annotation effort while boosting 3D perception for autonomous driving.

Abstract

Accurate 3D object detection and understanding for self-driving cars heavily relies on LiDAR point clouds, necessitating large amounts of labeled data to train. In this work, we introduce an innovative pre-training approach, Grounded Point Colorization (GPC), to bridge the gap between data and labels by teaching the model to colorize LiDAR point clouds, equipping it with valuable semantic cues. To tackle challenges arising from color variations and selection bias, we incorporate color as "context" by providing ground-truth colors as hints during colorization. Experimental results on the KITTI and Waymo datasets demonstrate GPC's remarkable effectiveness. Even with limited labeled data, GPC significantly improves fine-tuning performance; notably, on just 20% of the KITTI dataset, GPC outperforms training from scratch with the entire dataset. In sum, we introduce a fresh perspective on pre-training for 3D object detection, aligning the objective with the model's intended role and ultimately advancing the accuracy and efficiency of 3D object detection for autonomous vehicles.
Paper Structure (46 sections, 5 equations, 5 figures, 13 tables)

This paper contains 46 sections, 5 equations, 5 figures, 13 tables.

Figures (5)

  • Figure 1: Illustration of the proposed Grounded Point Colorization (GPC). We pre-train the detector backbone by colorization (left to right), taking the hints on a seed set of points (middle) to overcome the inherent color variation. We show that the middle step is the key to learning the semantic cues of objects from colors.
  • Figure 2: Architecture of GPC. The key insight is grounding the pre-training colorization process on the hints, allowing the model backbone to focus on learning semantically meaningful representations that indicate which subsets ( i.e., segments) of points should be colored similarly to facilitate downstream 3D object detection.
  • Figure 3: Color quantization. We apply K-Means algorithm to cluster RGB pixel values into discrete bins. The resulting image with $128$ bins is hardly distinguishable from the original image by human eyes.
  • Figure 4: Qualitative results of colorization. (left) We use trained GPC to infer on validation LiDAR (b) without any seeds and with (c) resulting in (d). The model learned the bias about the driving scene: the ground is gray and black, trees are green, and tail lights are red, but it predicts the average color on all cars because they can be variant. On the other hand, it correctly colorizes the point cloud if some hints are provided. We hypothesize such an ability to know where to pass the color is essential for downstream 3D detection. (right) (e) We further manipulate the seeds by hand. (f) GPC successfully colorize the regions with given colors in (e).
  • Figure S1: Training curve on 5% KITTI. (left) is the performance ($\uparrow$) on KITTI validation set and (right) is the loss ($\downarrow$) on the training set during the fine-tuning. It shows the effectiveness and efficiency of GPC compared to training from scratch.