Table of Contents
Fetching ...

DINOSTAR: Deep Iterative Neural Object Detector Self-Supervised Training for Roadside LiDAR Applications

Muhammad Shahbaz, Shaurya Agarwal

TL;DR

This work tackles the high labeling burden in roadside LiDAR object detection by introducing a self-supervised teacher–student framework. Multiple statistically modeled teachers generate noisy annotations through background filtering, clustering, and heuristic bounding-boxes, which train a robust student detector without human labeling. The approach demonstrates competitive performance with supervised detectors on public roadside datasets and emphasizes data augmentation across locations and perspectives. Its scalable, autonomous annotation pipeline has significant practical impact for deploying roadside perception systems at scale while reducing labeling costs.

Abstract

Recent advancements in deep-learning methods for object detection in point-cloud data have enabled numerous roadside applications, fostering improvements in transportation safety and management. However, the intricate nature of point-cloud data poses significant challenges for human-supervised labeling, resulting in substantial expenditures of time and capital. This paper addresses the issue by developing an end-to-end, scalable, and self-supervised framework for training deep object detectors tailored for roadside point-cloud data. The proposed framework leverages self-supervised, statistically modeled teachers to train off-the-shelf deep object detectors, thus circumventing the need for human supervision. The teacher models follow fine-tuned set standard practices of background filtering, object clustering, bounding-box fitting, and classification to generate noisy labels. It is presented that by training the student model over the combined noisy annotations from multitude of teachers enhances its capacity to discern background/foreground more effectively and forces it to learn diverse point-cloud-representations for object categories of interest. The evaluations, involving publicly available roadside datasets and state-of-art deep object detectors, demonstrate that the proposed framework achieves comparable performance to deep object detectors trained on human-annotated labels, despite not utilizing such human-annotations in its training process.

DINOSTAR: Deep Iterative Neural Object Detector Self-Supervised Training for Roadside LiDAR Applications

TL;DR

This work tackles the high labeling burden in roadside LiDAR object detection by introducing a self-supervised teacher–student framework. Multiple statistically modeled teachers generate noisy annotations through background filtering, clustering, and heuristic bounding-boxes, which train a robust student detector without human labeling. The approach demonstrates competitive performance with supervised detectors on public roadside datasets and emphasizes data augmentation across locations and perspectives. Its scalable, autonomous annotation pipeline has significant practical impact for deploying roadside perception systems at scale while reducing labeling costs.

Abstract

Recent advancements in deep-learning methods for object detection in point-cloud data have enabled numerous roadside applications, fostering improvements in transportation safety and management. However, the intricate nature of point-cloud data poses significant challenges for human-supervised labeling, resulting in substantial expenditures of time and capital. This paper addresses the issue by developing an end-to-end, scalable, and self-supervised framework for training deep object detectors tailored for roadside point-cloud data. The proposed framework leverages self-supervised, statistically modeled teachers to train off-the-shelf deep object detectors, thus circumventing the need for human supervision. The teacher models follow fine-tuned set standard practices of background filtering, object clustering, bounding-box fitting, and classification to generate noisy labels. It is presented that by training the student model over the combined noisy annotations from multitude of teachers enhances its capacity to discern background/foreground more effectively and forces it to learn diverse point-cloud-representations for object categories of interest. The evaluations, involving publicly available roadside datasets and state-of-art deep object detectors, demonstrate that the proposed framework achieves comparable performance to deep object detectors trained on human-annotated labels, despite not utilizing such human-annotations in its training process.

Paper Structure

This paper contains 12 sections, 3 equations, 3 figures, 3 tables, 2 algorithms.

Figures (3)

  • Figure 1: We gather point cloud data from multiple location using similar sensors (a), the data is unified in pre-processing stage (b) and use separate (by changing hyper-parameters of) statistically modeled teachers (c) to generate a superset of weakly-labeled datasets (d). We then train the deep object detector (student model) to generalize over the superset (e).
  • Figure 2: Label Generation Pipeline: The teacher model (left) uses a novel background filtering method followed by traditional DBSCAN clustering to generate object clusters that are then classified based on shape heuristics to create annotations that are then used by student model (right) as ground-truth labels.
  • Figure 3: Iterative Training for Improving Student Belief