Hardness-Aware Scene Synthesis for Semi-Supervised 3D Object Detection
Shuai Zeng, Wenzhao Zheng, Jiwen Lu, Haibin Yan
TL;DR
This paper tackles the data-hungry nature of 3D object detection by introducing hardness-aware scene synthesis (HASS), which uses an online pseudo-database of pseudo-labels to synthesize diverse scenes by composing unlabeled foreground objects with labeled backgrounds. The approach employs a two-stage synthesis (easy and hard) and a dynamic pseudo-database that gradually shifts from high to low filtering thresholds and from sparse to dense object insertion, enabling progressive hardening of training data. Key contributions include leveraging pseudo-labels from a trained teacher for scene synthesis, a sparse-to-dense synthesis strategy, and extensive ablations showing improved generalization on KITTI and Waymo with limited labels, all without extra inference overhead. The work demonstrates that carefully controlled synthetic scene generation, guided by pseudo-label quality and curriculum-like density, can substantially boost semi-supervised 3D detection performance in autonomous driving settings.
Abstract
3D object detection aims to recover the 3D information of concerning objects and serves as the fundamental task of autonomous driving perception. Its performance greatly depends on the scale of labeled training data, yet it is costly to obtain high-quality annotations for point cloud data. While conventional methods focus on generating pseudo-labels for unlabeled samples as supplements for training, the structural nature of 3D point cloud data facilitates the composition of objects and backgrounds to synthesize realistic scenes. Motivated by this, we propose a hardness-aware scene synthesis (HASS) method to generate adaptive synthetic scenes to improve the generalization of the detection models. We obtain pseudo-labels for unlabeled objects and generate diverse scenes with different compositions of objects and backgrounds. As the scene synthesis is sensitive to the quality of pseudo-labels, we further propose a hardness-aware strategy to reduce the effect of low-quality pseudo-labels and maintain a dynamic pseudo-database to ensure the diversity and quality of synthetic scenes. Extensive experimental results on the widely used KITTI and Waymo datasets demonstrate the superiority of the proposed HASS method, which outperforms existing semi-supervised learning methods on 3D object detection. Code: https://github.com/wzzheng/HASS.
