Table of Contents
Fetching ...

Semi-supervised 3D Object Detection with PatchTeacher and PillarMix

Xiaopei Wu, Liang Peng, Liang Xie, Yuenan Hou, Binbin Lin, Xiaoshui Huang, Haifeng Liu, Deng Cai, Wanli Ouyang

TL;DR

This work tackles the memory bottleneck in semi-supervised 3D object detection by introducing PatchTeacher, which trains on high-resolution patches of a scene to produce high-quality pseudo labels, and PillarMix, a cross-scan pillar augmentation that enriches SSL data. PatchTeacher uses three design aids—Patch Normalizer, Quadrant Align, and Fovea Selection—to stabilize training on patches and maintain geometry consistency, while PillarMix mixes pillars from different LiDAR scans to create diverse, edge-rich samples. Together, they yield state-of-the-art results on Waymo and ONCE across varying labeled-data regimes and detectors, demonstrating strong gains over prior SSL methods such as ProficientTeacher and MeanTeacher. The approach provides a scalable, effective pathway to leverage unlabeled data for robust 3D detection in real-world driving scenarios.

Abstract

Semi-supervised learning aims to leverage numerous unlabeled data to improve the model performance. Current semi-supervised 3D object detection methods typically use a teacher to generate pseudo labels for a student, and the quality of the pseudo labels is essential for the final performance. In this paper, we propose PatchTeacher, which focuses on partial scene 3D object detection to provide high-quality pseudo labels for the student. Specifically, we divide a complete scene into a series of patches and feed them to our PatchTeacher sequentially. PatchTeacher leverages the low memory consumption advantage of partial scene detection to process point clouds with a high-resolution voxelization, which can minimize the information loss of quantization and extract more fine-grained features. However, it is non-trivial to train a detector on fractions of the scene. Therefore, we introduce three key techniques, i.e., Patch Normalizer, Quadrant Align, and Fovea Selection, to improve the performance of PatchTeacher. Moreover, we devise PillarMix, a strong data augmentation strategy that mixes truncated pillars from different LiDAR scans to generate diverse training samples and thus help the model learn more general representation. Extensive experiments conducted on Waymo and ONCE datasets verify the effectiveness and superiority of our method and we achieve new state-of-the-art results, surpassing existing methods by a large margin. Codes are available at https://github.com/LittlePey/PTPM.

Semi-supervised 3D Object Detection with PatchTeacher and PillarMix

TL;DR

This work tackles the memory bottleneck in semi-supervised 3D object detection by introducing PatchTeacher, which trains on high-resolution patches of a scene to produce high-quality pseudo labels, and PillarMix, a cross-scan pillar augmentation that enriches SSL data. PatchTeacher uses three design aids—Patch Normalizer, Quadrant Align, and Fovea Selection—to stabilize training on patches and maintain geometry consistency, while PillarMix mixes pillars from different LiDAR scans to create diverse, edge-rich samples. Together, they yield state-of-the-art results on Waymo and ONCE across varying labeled-data regimes and detectors, demonstrating strong gains over prior SSL methods such as ProficientTeacher and MeanTeacher. The approach provides a scalable, effective pathway to leverage unlabeled data for robust 3D detection in real-world driving scenarios.

Abstract

Semi-supervised learning aims to leverage numerous unlabeled data to improve the model performance. Current semi-supervised 3D object detection methods typically use a teacher to generate pseudo labels for a student, and the quality of the pseudo labels is essential for the final performance. In this paper, we propose PatchTeacher, which focuses on partial scene 3D object detection to provide high-quality pseudo labels for the student. Specifically, we divide a complete scene into a series of patches and feed them to our PatchTeacher sequentially. PatchTeacher leverages the low memory consumption advantage of partial scene detection to process point clouds with a high-resolution voxelization, which can minimize the information loss of quantization and extract more fine-grained features. However, it is non-trivial to train a detector on fractions of the scene. Therefore, we introduce three key techniques, i.e., Patch Normalizer, Quadrant Align, and Fovea Selection, to improve the performance of PatchTeacher. Moreover, we devise PillarMix, a strong data augmentation strategy that mixes truncated pillars from different LiDAR scans to generate diverse training samples and thus help the model learn more general representation. Extensive experiments conducted on Waymo and ONCE datasets verify the effectiveness and superiority of our method and we achieve new state-of-the-art results, surpassing existing methods by a large margin. Codes are available at https://github.com/LittlePey/PTPM.
Paper Structure (30 sections, 3 equations, 5 figures, 16 tables)

This paper contains 30 sections, 3 equations, 5 figures, 16 tables.

Figures (5)

  • Figure 1: Our semi-supervised 3D object detection framework comprises two phases. In phase 1, we train a high-performance PatchTeacher. It focuses on partial scene detection, which enables a super high-resolution voxelization, achieving superior improvement. Three practical techniques and SSL are used to further boost the performance of our PatchTeacher. In phase 2, the high-quality pseudo labels produced by PatchTeacher is used to supervise the student model. Given pseudo labels, to make full use of them, we propose PillarMix, which mixes pillars of different LiDAR scans crossly, making a strong data augmentation. Then semi-sampling and common data augmentations are followed. Note that semi-sampling is the improved version we develop based on PseudoAugment. The details are provided in the implementation details.
  • Figure 2: Illustration of Fovea Selection.
  • Figure 3: Illustration of Semi-Sampling.
  • Figure 4: Different kinds of point clouds.
  • Figure 5: Comparison of SECOND, PseduoAugment and our PTPM under Waymo 5% protocal. We show ground-truth boxes and predictions in red and green, respectively. Points in ground-truth boxes are also rendered in red.