Table of Contents
Fetching ...

Effort-Based Criticality Metrics for Evaluating 3D Perception Errors in Autonomous Driving

Sharang Kaul, Simon Bultmann, Mario Berk, Abhinav Valada

Abstract

Criticality metrics such as time-to-collision (TTC) quantify collision urgency but conflate the consequences of false-positive (FP) and false-negative (FN) perception errors. We propose two novel effort-based metrics: False Speed Reduction (FSR), the cumulative velocity loss from persistent phantom detections, and Maximum Deceleration Rate (MDR), the peak braking demand from missed objects under a constant-acceleration model. These longitudinal metrics are complemented by Lateral Evasion Acceleration (LEA), adapted from prior lateral evasion kinematics and coupled with reachability-based collision timing to quantify the minimum steering effort to avoid a predicted collision. A reachability-based ellipsoidal collision filter ensures only dynamically plausible threats are scored, with frame-level matching and track-level aggregation. Evaluation of different perception pipelines on nuScenes and Argoverse~2 shows that 65-93% of errors are non-critical, and Spearman correlation analysis confirms that all three metrics capture safety-relevant information inaccessible to established time-based, deceleration-based, or normalized criticality measures, enabling targeted mining of the most critical perception failures.

Effort-Based Criticality Metrics for Evaluating 3D Perception Errors in Autonomous Driving

Abstract

Criticality metrics such as time-to-collision (TTC) quantify collision urgency but conflate the consequences of false-positive (FP) and false-negative (FN) perception errors. We propose two novel effort-based metrics: False Speed Reduction (FSR), the cumulative velocity loss from persistent phantom detections, and Maximum Deceleration Rate (MDR), the peak braking demand from missed objects under a constant-acceleration model. These longitudinal metrics are complemented by Lateral Evasion Acceleration (LEA), adapted from prior lateral evasion kinematics and coupled with reachability-based collision timing to quantify the minimum steering effort to avoid a predicted collision. A reachability-based ellipsoidal collision filter ensures only dynamically plausible threats are scored, with frame-level matching and track-level aggregation. Evaluation of different perception pipelines on nuScenes and Argoverse~2 shows that 65-93% of errors are non-critical, and Spearman correlation analysis confirms that all three metrics capture safety-relevant information inaccessible to established time-based, deceleration-based, or normalized criticality measures, enabling targeted mining of the most critical perception failures.

Paper Structure

This paper contains 17 sections, 12 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 2: Effort-based criticality assessment of perception errors. The pipeline identifies false positives (FPs) and false negatives (FNs) by matching tracker outputs against ground-truth (GT) annotations (green boxes), performs reachable-set collision analysis, and computes effort metrics: False Speed Reduction (FSR) captures the cumulative velocity loss from persistent FPs, and Maximum Deceleration Rate (MDR) captures the peak braking demand from missed objects. Left side: FP from AB3DMOT on nuScenes where a phantom detection (yellow) persists for $19$ s ($39$ frames) at ${\sim}20$ m ahead with no matching GT object. FSR accumulates to $7.2$ ms. Right side: FN from BEVFusion on Argoverse 2 where a real vehicle in the ego lane (red, $R_{\text{lat}}\!\approx\!0.3$ m) is missed for ${\sim}16$ s. MDR peaks at $2.4$ ms. LiDAR bird's-eye-view panels show multi-sweep point clouds (gray) with GT annotations and the erroneous detection/miss highlighted. Metrics are computed only during frames with perception errors.
  • Figure 3: Collision avoidance strategies between the ego vehicle (gray) and a perception error (red). Left: Two independent avoidance options are evaluated per frame: (i) longitudinal braking ($a_{\text{brake}}$) and (ii) lateral evasion, either by steering away ($a_{\text{req},\perp,\text{widen}}$) or steering past ($a_{\text{req},\perp,\text{cross}}$). Right: Lateral evasion geometry as described in Sec. \ref{['sec:lateral']}.
  • Figure 4: Distribution of non-safe perception errors (absolute track counts) on the nuScenes validation set. Rows: Car / Truck. Columns: FN by MDR, FP by FSR, all by LEA. Severity thresholds follow Tab. \ref{['tab:criticality_thresholds']}.
  • Figure 5: Scenario-level analysis for scene 26a6b03c (nuScenes, BEVFusion). Each row shows, from left to right: bird's-eye trajectory evolution with start/end markers, reachability-based TTC ($\text{TTC}_{\text{RSB}}$), braking effort ($a_{\text{brake}}$), lateral evasion effort (LEA), and the cumulative criticality metric (MDR for FNs, FSR for FPs). Top: three FN objects ($\text{FN}_{1\text{-}3}$) illustrate how MDR saturates for both in-path and laterally distant objects, while LEA disambiguates true collision risk. Bottom: three FP objects ($\text{FP}_{1\text{-}3}$) demonstrate FSR's persistence-sensitive accumulation. $\text{FP}_1$ (24 frames, low per-frame braking) scores highest, while short-lived FPs with higher intensity receive lower FSR.
  • Figure 6: Classical TTC ($d/v_{\text{rel}}$) vs. minimum distance for effort-critical car-category tracks on the nuScenes validation set. Dashed line: TTC$=2$ s. A substantial fraction of MDR-critical FN tracks cluster at TTC${>}3$ s or at the dataset cap of $10$ s, indicating that classical TTC rates them as safe despite high braking demand.