Table of Contents
Fetching ...

Power of Cooperative Supervision: Multiple Teachers Framework for Enhanced 3D Semi-Supervised Object Detection

Jin-Hee Lee, Jae-Keun Lee, Je-Seok Kim, Soon Kwon

TL;DR

This work tackles the challenge of high-performance 3D object detection under limited labels by introducing a MultipleTeachers semi-supervised framework and a novel Pie-based augmentation to improve pseudo-label quality and generalization. It leverages category-specific teachers (Car, Pedestrian, Cyclist) via MPGen and a three-stage training pipeline (Burn-In, Fine-tuning, Mutual Learning) with a category-wise EMA to collaboratively refine detections. The authors also present the LiO LiDAR Open Dataset, providing a diverse, balanced, 360°-FOV benchmark with extensive labeled and unlabeled frames to drive SSL research in autonomous driving. Across KITTI, Waymo Open Dataset, and LiO, the method achieves state-of-the-art results, demonstrating improved performance on small objects and robustness across detectors, with practical impact for safer urban autonomy.

Abstract

To ensure safe urban driving for autonomous platforms, it is crucial not only to develop high-performance object detection techniques but also to establish a diverse and representative dataset that captures various urban environments and object characteristics. To address these two issues, we have constructed a multi-class 3D LiDAR dataset reflecting diverse urban environments and object characteristics, and developed a robust 3D semi-supervised object detection (SSOD) based on a multiple teachers framework. This SSOD framework categorizes similar classes and assigns specialized teachers to each category. Through collaborative supervision among these category-specialized teachers, the student network becomes increasingly proficient, leading to a highly effective object detector. We propose a simple yet effective augmentation technique, Pie-based Point Compensating Augmentation (PieAug), to enable the teacher network to generate high-quality pseudo-labels. Extensive experiments on the WOD, KITTI, and our datasets validate the effectiveness of our proposed method and the quality of our dataset. Experimental results demonstrate that our approach consistently outperforms existing state-of-the-art 3D semi-supervised object detection methods across all datasets. We plan to release our multi-class LiDAR dataset and the source code available on our Github repository in the near future.

Power of Cooperative Supervision: Multiple Teachers Framework for Enhanced 3D Semi-Supervised Object Detection

TL;DR

This work tackles the challenge of high-performance 3D object detection under limited labels by introducing a MultipleTeachers semi-supervised framework and a novel Pie-based augmentation to improve pseudo-label quality and generalization. It leverages category-specific teachers (Car, Pedestrian, Cyclist) via MPGen and a three-stage training pipeline (Burn-In, Fine-tuning, Mutual Learning) with a category-wise EMA to collaboratively refine detections. The authors also present the LiO LiDAR Open Dataset, providing a diverse, balanced, 360°-FOV benchmark with extensive labeled and unlabeled frames to drive SSL research in autonomous driving. Across KITTI, Waymo Open Dataset, and LiO, the method achieves state-of-the-art results, demonstrating improved performance on small objects and robustness across detectors, with practical impact for safer urban autonomy.

Abstract

To ensure safe urban driving for autonomous platforms, it is crucial not only to develop high-performance object detection techniques but also to establish a diverse and representative dataset that captures various urban environments and object characteristics. To address these two issues, we have constructed a multi-class 3D LiDAR dataset reflecting diverse urban environments and object characteristics, and developed a robust 3D semi-supervised object detection (SSOD) based on a multiple teachers framework. This SSOD framework categorizes similar classes and assigns specialized teachers to each category. Through collaborative supervision among these category-specialized teachers, the student network becomes increasingly proficient, leading to a highly effective object detector. We propose a simple yet effective augmentation technique, Pie-based Point Compensating Augmentation (PieAug), to enable the teacher network to generate high-quality pseudo-labels. Extensive experiments on the WOD, KITTI, and our datasets validate the effectiveness of our proposed method and the quality of our dataset. Experimental results demonstrate that our approach consistently outperforms existing state-of-the-art 3D semi-supervised object detection methods across all datasets. We plan to release our multi-class LiDAR dataset and the source code available on our Github repository in the near future.
Paper Structure (22 sections, 6 equations, 7 figures, 10 tables, 1 algorithm)

This paper contains 22 sections, 6 equations, 7 figures, 10 tables, 1 algorithm.

Figures (7)

  • Figure 1: Comparisons between the previous method and our MultipleTeachers framework. (a) utilizes a single teacher-student paradigm to generate pseudo-labels with flip and rotation augmentations. In contrast, (b) employs the multiple teachers-student paradigm to generate more precise pseudo-labels using the novel strong augmentation, PieAug. (c) represents the 3D detection results predicted by the vanilla 3D SSOD method. (d) indicates the 3D detection results predicted by our proposed MultipleTeachers. Note that, 'PGen' refers to pseudo-label generator, and 'MPGen' denotes multiple teachers-based pseudo-labels generator
  • Figure 2: The overview of MultipleTeachers Framework for 3D semi-supervised object detection. This framework consists of multiple teachers network and a single student network. Each teacher network classifies objects with different aspect ratios and shapes into specific categories, enabling more accurate predictions. By leveraging the collaboration of these specialized teacher networks per category, high-accuracy pseudo-labels are generated to train the student network. The student network not only utilizes these high-quality pseudo-labels for training but also applies a novel data augmentation strategy, named PieAug, which effectively detects occluded or distant objects from LiDAR.
  • Figure 3: 2-stage training process for MultipleTeachers framework. In stage (a), the baseline detector is pre-trained with labeled data. Consecutively, the parameters and structure of this pre-trained detector are duplicated across three specialized teacher networks: Car Teacher for car categories, Ped. Teacher for pedestrian categories, and Cyc. Teacher for cyclist categories, respectively. In stage (b), each teacher network fine-tunes its category-specific parameters and the corresponding heads (i.e., Car Head, Ped. Head, Cyc. Head) to more accurately predict pseudo-labels within the designated category. This process ensures the generation of more precise pseudo-labels by each teacher
  • Figure 4: Illustration of the PieAug process. Point clouds are divided into several pies of equal size, classified by the density of objects within each pie as either sparse or dense. The points from dense pies compensate for the lack of points in sparse pies. It is lead to the student network to learn from enriched features of point data. Note that, the gray boxes represent before augmentation, while the brown boxes indicate boxes newly added through augmentation
  • Figure 5: Number of annotated 3D bounding boxes per category for each frame. Each frame contains 45 boxes for WOD, 38 boxes for H3D, 15 boxes for the ONCE, 5 boxes for KITTI, and 36 boxes for our LiO dataset, respectively.
  • ...and 2 more figures