Semi-Supervised 3D Object Detection with Channel Augmentation using Transformation Equivariance

Minju Kang; Taehun Kong; Tae-Kyun Kim

Semi-Supervised 3D Object Detection with Channel Augmentation using Transformation Equivariance

Minju Kang, Taehun Kong, Tae-Kyun Kim

TL;DR

A novel teacher-student framework employing channel augmentation for 3D semi-supervised object detection using the transformation equivariance detector (TED), which achieves a significant performance leap, surpassing SOTA 3D semi-supervised object detection models.

Abstract

Accurate 3D object detection is crucial for autonomous vehicles and robots to navigate and interact with the environment safely and effectively. Meanwhile, the performance of 3D detector relies on the data size and annotation which is expensive. Consequently, the demand of training with limited labeled data is growing. We explore a novel teacher-student framework employing channel augmentation for 3D semi-supervised object detection. The teacher-student SSL typically adopts a weak augmentation and strong augmentation to teacher and student, respectively. In this work, we apply multiple channel augmentations to both networks using the transformation equivariance detector (TED). The TED allows us to explore different combinations of augmentation on point clouds and efficiently aggregates multi-channel transformation equivariance features. In principle, by adopting fixed channel augmentations for the teacher network, the student can train stably on reliable pseudo-labels. Adopting strong channel augmentations can enrich the diversity of data, fostering robustness to transformations and enhancing generalization performance of the student network. We use SOTA hierarchical supervision as a baseline and adapt its dual-threshold to TED, which is called channel IoU consistency. We evaluate our method with KITTI dataset, and achieved a significant performance leap, surpassing SOTA 3D semi-supervised object detection models.

Semi-Supervised 3D Object Detection with Channel Augmentation using Transformation Equivariance

TL;DR

Abstract

Paper Structure (16 sections, 6 equations, 4 figures, 2 tables)

This paper contains 16 sections, 6 equations, 4 figures, 2 tables.

Introduction
Related Work
Semi-Supervised Learning
3D Semi-Supervised Object Detection
SSL using TED and hierarchical supervision
Method Overview
Notations for the teacher-student SSL
Background: Transformation Equivariant Detector
Learning with Transformation Channels
Training Objectives
Experiments
Dataset and Metrics
Implementation Details
Main Results
Ablation Study
...and 1 more sections

Figures (4)

Figure 1: Overview of the proposed method. It augments input channels to the teacher and student and aggregates them using transformation equivariance features as in TED. HSSDA is applied with the pseudo-box qualities based on TED.
Figure 2: IoU consistency comparison.
Figure 3: Qualitative comparisons of pseudo-boxes on KITTI. Ground truth bounding boxes appear in red, our predicted pseudo-boxes in cyan, and HSSDA's pseudo-boxes in green.
Figure 4: The total number of incorrect pseudo-boxes on KITTI dataset. The above plot is about the number of wrong predictions of teacher model of Ours and HSSDA across training epoch. The below plot is after the pseudo-box filtering.

Semi-Supervised 3D Object Detection with Channel Augmentation using Transformation Equivariance

TL;DR

Abstract

Semi-Supervised 3D Object Detection with Channel Augmentation using Transformation Equivariance

Authors

TL;DR

Abstract

Table of Contents

Figures (4)