Table of Contents
Fetching ...

Semi-Supervised 3D Object Detection with Channel Augmentation using Transformation Equivariance

Minju Kang, Taehun Kong, Tae-Kyun Kim

TL;DR

A novel teacher-student framework employing channel augmentation for 3D semi-supervised object detection using the transformation equivariance detector (TED), which achieves a significant performance leap, surpassing SOTA 3D semi-supervised object detection models.

Abstract

Accurate 3D object detection is crucial for autonomous vehicles and robots to navigate and interact with the environment safely and effectively. Meanwhile, the performance of 3D detector relies on the data size and annotation which is expensive. Consequently, the demand of training with limited labeled data is growing. We explore a novel teacher-student framework employing channel augmentation for 3D semi-supervised object detection. The teacher-student SSL typically adopts a weak augmentation and strong augmentation to teacher and student, respectively. In this work, we apply multiple channel augmentations to both networks using the transformation equivariance detector (TED). The TED allows us to explore different combinations of augmentation on point clouds and efficiently aggregates multi-channel transformation equivariance features. In principle, by adopting fixed channel augmentations for the teacher network, the student can train stably on reliable pseudo-labels. Adopting strong channel augmentations can enrich the diversity of data, fostering robustness to transformations and enhancing generalization performance of the student network. We use SOTA hierarchical supervision as a baseline and adapt its dual-threshold to TED, which is called channel IoU consistency. We evaluate our method with KITTI dataset, and achieved a significant performance leap, surpassing SOTA 3D semi-supervised object detection models.

Semi-Supervised 3D Object Detection with Channel Augmentation using Transformation Equivariance

TL;DR

A novel teacher-student framework employing channel augmentation for 3D semi-supervised object detection using the transformation equivariance detector (TED), which achieves a significant performance leap, surpassing SOTA 3D semi-supervised object detection models.

Abstract

Accurate 3D object detection is crucial for autonomous vehicles and robots to navigate and interact with the environment safely and effectively. Meanwhile, the performance of 3D detector relies on the data size and annotation which is expensive. Consequently, the demand of training with limited labeled data is growing. We explore a novel teacher-student framework employing channel augmentation for 3D semi-supervised object detection. The teacher-student SSL typically adopts a weak augmentation and strong augmentation to teacher and student, respectively. In this work, we apply multiple channel augmentations to both networks using the transformation equivariance detector (TED). The TED allows us to explore different combinations of augmentation on point clouds and efficiently aggregates multi-channel transformation equivariance features. In principle, by adopting fixed channel augmentations for the teacher network, the student can train stably on reliable pseudo-labels. Adopting strong channel augmentations can enrich the diversity of data, fostering robustness to transformations and enhancing generalization performance of the student network. We use SOTA hierarchical supervision as a baseline and adapt its dual-threshold to TED, which is called channel IoU consistency. We evaluate our method with KITTI dataset, and achieved a significant performance leap, surpassing SOTA 3D semi-supervised object detection models.
Paper Structure (16 sections, 6 equations, 4 figures, 2 tables)

This paper contains 16 sections, 6 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of the proposed method. It augments input channels to the teacher and student and aggregates them using transformation equivariance features as in TED. HSSDA is applied with the pseudo-box qualities based on TED.
  • Figure 2: IoU consistency comparison.
  • Figure 3: Qualitative comparisons of pseudo-boxes on KITTI. Ground truth bounding boxes appear in red, our predicted pseudo-boxes in cyan, and HSSDA's pseudo-boxes in green.
  • Figure 4: The total number of incorrect pseudo-boxes on KITTI dataset. The above plot is about the number of wrong predictions of teacher model of Ours and HSSDA across training epoch. The below plot is after the pseudo-box filtering.