Table of Contents
Fetching ...

TrajSSL: Trajectory-Enhanced Semi-Supervised 3D Object Detection

Philip Jacobson, Yichen Xie, Mingyu Ding, Chenfeng Xu, Masayoshi Tomizuka, Wei Zhan, Ming C. Wu

TL;DR

This work addresses the problem of improving pseudo-label quality through leveraging long- term temporal information captured in driving scenes through leveraging pre-trained motion-forecasting models to generate object trajectories on pseudo-labeled data to further enhance the student model training.

Abstract

Semi-supervised 3D object detection is a common strategy employed to circumvent the challenge of manually labeling large-scale autonomous driving perception datasets. Pseudo-labeling approaches to semi-supervised learning adopt a teacher-student framework in which machine-generated pseudo-labels on a large unlabeled dataset are used in combination with a small manually-labeled dataset for training. In this work, we address the problem of improving pseudo-label quality through leveraging long-term temporal information captured in driving scenes. More specifically, we leverage pre-trained motion-forecasting models to generate object trajectories on pseudo-labeled data to further enhance the student model training. Our approach improves pseudo-label quality in two distinct manners: first, we suppress false positive pseudo-labels through establishing consistency across multiple frames of motion forecasting outputs. Second, we compensate for false negative detections by directly inserting predicted object tracks into the pseudo-labeled scene. Experiments on the nuScenes dataset demonstrate the effectiveness of our approach, improving the performance of standard semi-supervised approaches in a variety of settings.

TrajSSL: Trajectory-Enhanced Semi-Supervised 3D Object Detection

TL;DR

This work addresses the problem of improving pseudo-label quality through leveraging long- term temporal information captured in driving scenes through leveraging pre-trained motion-forecasting models to generate object trajectories on pseudo-labeled data to further enhance the student model training.

Abstract

Semi-supervised 3D object detection is a common strategy employed to circumvent the challenge of manually labeling large-scale autonomous driving perception datasets. Pseudo-labeling approaches to semi-supervised learning adopt a teacher-student framework in which machine-generated pseudo-labels on a large unlabeled dataset are used in combination with a small manually-labeled dataset for training. In this work, we address the problem of improving pseudo-label quality through leveraging long-term temporal information captured in driving scenes. More specifically, we leverage pre-trained motion-forecasting models to generate object trajectories on pseudo-labeled data to further enhance the student model training. Our approach improves pseudo-label quality in two distinct manners: first, we suppress false positive pseudo-labels through establishing consistency across multiple frames of motion forecasting outputs. Second, we compensate for false negative detections by directly inserting predicted object tracks into the pseudo-labeled scene. Experiments on the nuScenes dataset demonstrate the effectiveness of our approach, improving the performance of standard semi-supervised approaches in a variety of settings.
Paper Structure (17 sections, 5 equations, 3 figures, 4 tables)

This paper contains 17 sections, 5 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Comparison between a scene containing only teacher-generated pseudo-labels (in green), and the scene augmented with both pseudo-labels and predicted trajectory boxes (in red). Overlapping red and green boxes indicate pseudo-labels exhibiting a high degree of temporal consistency, which are further emphasized during student training. Green boxes without overlap indicate pseudo-labels exhibiting a low degree of temporal consistency, and hence more likely to be a false positive detection. Unmatched red boxes indicate potential missed detections by the teacher model, and are also added as soft targets during training.
  • Figure 2: Overview of our proposed method TrajSSL. In addition to a teacher-student SSL framework, we introduce a trajectory prediction model (AgentFormer) which predicts future object trajectories based on past pseudo-label tracks. The inference output of this model is combined with the perception pseudo-labels and an IoU=matching process is performed. Pseudo-labels are then weighted during supervision based on the degree to which they agree with the forecasted trajectories. Meanwhile, predictions which don't match already existing pseudo-labels are added to the training process as down-weighted pseudo-labels.
  • Figure 3: Illustrated process of generation trajectories from pseudo-labels. First, we pre-train both our teacher detector model and our trajectory prediction model using the available labeled scene data. Next, we use the teacher model to run inference on the unlabeled scene data. Next, we link the produced pseudo-labels into tracks of objects across time. Lastly, we feed these tracks into prediction model to generate synthetic trajectories.