Table of Contents
Fetching ...

Towards Stable 3D Object Detection

Jiabao Wang, Qiang Meng, Guochao Liu, Liujiang Yan, Ke Wang, Ming-Ming Cheng, Qibin Hou

TL;DR

This work addresses the overlooked problem of temporal stability in 3D object detection for autonomous driving by introducing the Stability Index (SI), a comprehensive metric that jointly evaluates confidence, box localization, extent, and heading stability. It analyzes the limitations of existing metrics and proposes a principled stability framework featuring projection with a pivot box, element decoupling, and aggregation to ensure symmetry and marginal unimodality. To improve stability, the authors propose Prediction Consistency Learning (PCL), a training strategy that enforces cross-frame prediction consistency under augmentations without affecting inference cost. Empirical results on the Waymo Open Dataset show SI as a complementary measure to mAPH, and demonstrate that PCL can significantly boost SI (e.g., CenterPoint vehicle SI from 80.52 to 86.00) while maintaining or modestly affecting accuracy, indicating practical benefits for safer autonomous driving systems.

Abstract

In autonomous driving, the temporal stability of 3D object detection greatly impacts the driving safety. However, the detection stability cannot be accessed by existing metrics such as mAP and MOTA, and consequently is less explored by the community. To bridge this gap, this work proposes Stability Index (SI), a new metric that can comprehensively evaluate the stability of 3D detectors in terms of confidence, box localization, extent, and heading. By benchmarking state-of-the-art object detectors on the Waymo Open Dataset, SI reveals interesting properties of object stability that have not been previously discovered by other metrics. To help models improve their stability, we further introduce a general and effective training strategy, called Prediction Consistency Learning (PCL). PCL essentially encourages the prediction consistency of the same objects under different timestamps and augmentations, leading to enhanced detection stability. Furthermore, we examine the effectiveness of PCL with the widely-used CenterPoint, and achieve a remarkable SI of 86.00 for vehicle class, surpassing the baseline by 5.48. We hope our work could serve as a reliable baseline and draw the community's attention to this crucial issue in 3D object detection. Codes will be made publicly available.

Towards Stable 3D Object Detection

TL;DR

This work addresses the overlooked problem of temporal stability in 3D object detection for autonomous driving by introducing the Stability Index (SI), a comprehensive metric that jointly evaluates confidence, box localization, extent, and heading stability. It analyzes the limitations of existing metrics and proposes a principled stability framework featuring projection with a pivot box, element decoupling, and aggregation to ensure symmetry and marginal unimodality. To improve stability, the authors propose Prediction Consistency Learning (PCL), a training strategy that enforces cross-frame prediction consistency under augmentations without affecting inference cost. Empirical results on the Waymo Open Dataset show SI as a complementary measure to mAPH, and demonstrate that PCL can significantly boost SI (e.g., CenterPoint vehicle SI from 80.52 to 86.00) while maintaining or modestly affecting accuracy, indicating practical benefits for safer autonomous driving systems.

Abstract

In autonomous driving, the temporal stability of 3D object detection greatly impacts the driving safety. However, the detection stability cannot be accessed by existing metrics such as mAP and MOTA, and consequently is less explored by the community. To bridge this gap, this work proposes Stability Index (SI), a new metric that can comprehensively evaluate the stability of 3D detectors in terms of confidence, box localization, extent, and heading. By benchmarking state-of-the-art object detectors on the Waymo Open Dataset, SI reveals interesting properties of object stability that have not been previously discovered by other metrics. To help models improve their stability, we further introduce a general and effective training strategy, called Prediction Consistency Learning (PCL). PCL essentially encourages the prediction consistency of the same objects under different timestamps and augmentations, leading to enhanced detection stability. Furthermore, we examine the effectiveness of PCL with the widely-used CenterPoint, and achieve a remarkable SI of 86.00 for vehicle class, surpassing the baseline by 5.48. We hope our work could serve as a reliable baseline and draw the community's attention to this crucial issue in 3D object detection. Codes will be made publicly available.
Paper Structure (15 sections, 2 theorems, 11 equations, 5 figures, 4 tables)

This paper contains 15 sections, 2 theorems, 11 equations, 5 figures, 4 tables.

Key Result

lemma thmcounterlemma

SI is a symmetric metric which uniformly assesses all elements' influences on the detection stability.

Figures (5)

  • Figure 1: Visualizations of potential safety threats caused by detection instability. On the left, confidence fluctuations lead to flickering boxes, which results in inaccurate object association and induces an abnormal velocity estimation. On the right, an intent of merging into traffic is erroneously forecast because of the shaking boxes, though the vehicle is stationary in fact. Here, dashed boxes represent the ground truths. Detection results are predicted by yan2018second, and object tracking is conducted with SimpleTrack pang2021simpletrack.
  • Figure 2: The procedure of computing Stability Index. The orange and blue boxes represent the best matches between the predictions and the ground-truths searched by the Hungarian algorithm. These boxes are subsequently associated across frames using their object ID labels. After projecting predictions into a pre-built pivot box, SI decouples them into element-wise computations, which are then aggregated for the final assessment of detection stability.
  • Figure 3: The pipeline of the proposed Prediction Consistency Learning (PCL). In each iteration, PCL samples a pair of frames at neighboring timestamps $t$ and $t'$, and applies augmentations $\mathbf{M}$ and $\mathbf{M}'$ to the paired samples. GT-prediction matching and cross-frame matching then collaboratively associate the detector's predictions from the same objects between the two frames. After the de-augmentation procedure, PCL calculates the prediction errors in terms of confidence, localization, extent, and heading, which are defined in the object self-coordinate system. Finally, PCL penalizes the error disparities among all prediction pairs to enforce the temporal consistency. In the figure, pred. and aug. represent prediction and augmentation, respectively.
  • Figure 4: Relationships between object properties and detection stability.
  • Figure 5: Visualizations of ground-truths (in pink) and predictions of CenterPoint models trained by the baseline (in orange), multi-frame strategy (in green), and PCL strategy (in blue). Predicted confidences (top row) and 3D boxes (bottom row) are all presented.

Theorems & Definitions (2)

  • lemma thmcounterlemma
  • lemma thmcounterlemma