Table of Contents
Fetching ...

Contour Errors: An Ego-Centric Metric for Reliable 3D Multi-Object Tracking

Sharang Kaul, Mario Berk, Thiemo Gerbich, Abhinav Valada

TL;DR

This work targets reliable 3D multi-object tracking in autonomous driving by introducing Contour Errors (CE), an ego-centric matching metric that emphasizes object shape and orientation from the ego vehicle's perspective. CE employs a Hungarian assignment over a contour-based distance, sampling corners near the ego in both 2D and 3D variants and applying a threshold tau_E to decide matches. Extensive evaluations on nuScenes and KITTI show CE outperforms traditional IoU and CPD in safety-critical scenarios, especially under partial visibility and yaw misalignment, and reveals important discrepancies in detector rankings in ego-centric terms. The study demonstrates that optimizing for safety-critical matching requires ego-centric, contour-aware metrics and that relying on a single aggregate score (like mHOTA) may obscure critical failure modes in real-world driving safety tasks.

Abstract

Finding reliable matches is essential in multi-object tracking to ensure the accuracy and reliability of perception systems in safety-critical applications such as autonomous vehicles. Effective matching mitigates perception errors, enhancing object identification and tracking for improved performance and safety. However, traditional metrics such as Intersection over Union (IoU) and Center Point Distances (CPDs), which are effective in 2D image planes, often fail to find critical matches in complex 3D scenes. To address this limitation, we introduce Contour Errors (CEs), an ego or object-centric metric for identifying matches of interest in tracking scenarios from a functional perspective. By comparing bounding boxes in the ego vehicle's frame, contour errors provide a more functionally relevant assessment of object matches. Extensive experiments on the nuScenes dataset demonstrate that contour errors improve the reliability of matches over the state-of-the-art 2D IoU and CPD metrics in tracking-by-detection methods. In 3D car tracking, our results show that Contour Errors reduce functional failures (FPs/FNs) by 80% at close ranges and 60% at far ranges compared to IoU in the evaluation stage.

Contour Errors: An Ego-Centric Metric for Reliable 3D Multi-Object Tracking

TL;DR

This work targets reliable 3D multi-object tracking in autonomous driving by introducing Contour Errors (CE), an ego-centric matching metric that emphasizes object shape and orientation from the ego vehicle's perspective. CE employs a Hungarian assignment over a contour-based distance, sampling corners near the ego in both 2D and 3D variants and applying a threshold tau_E to decide matches. Extensive evaluations on nuScenes and KITTI show CE outperforms traditional IoU and CPD in safety-critical scenarios, especially under partial visibility and yaw misalignment, and reveals important discrepancies in detector rankings in ego-centric terms. The study demonstrates that optimizing for safety-critical matching requires ego-centric, contour-aware metrics and that relying on a single aggregate score (like mHOTA) may obscure critical failure modes in real-world driving safety tasks.

Abstract

Finding reliable matches is essential in multi-object tracking to ensure the accuracy and reliability of perception systems in safety-critical applications such as autonomous vehicles. Effective matching mitigates perception errors, enhancing object identification and tracking for improved performance and safety. However, traditional metrics such as Intersection over Union (IoU) and Center Point Distances (CPDs), which are effective in 2D image planes, often fail to find critical matches in complex 3D scenes. To address this limitation, we introduce Contour Errors (CEs), an ego or object-centric metric for identifying matches of interest in tracking scenarios from a functional perspective. By comparing bounding boxes in the ego vehicle's frame, contour errors provide a more functionally relevant assessment of object matches. Extensive experiments on the nuScenes dataset demonstrate that contour errors improve the reliability of matches over the state-of-the-art 2D IoU and CPD metrics in tracking-by-detection methods. In 3D car tracking, our results show that Contour Errors reduce functional failures (FPs/FNs) by 80% at close ranges and 60% at far ranges compared to IoU in the evaluation stage.

Paper Structure

This paper contains 16 sections, 1 equation, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: Motivation for ego-centric Contour Error (CE) association. This figure illustrates scenarios where ego-centric CE association provides more robust performance than standard object-centric metrics, particularly under challenging conditions. Each sub-figure comprises: (Left) The scene category and an exemplar challenging condition. (Middle) The ego-centric camera view with groundtruth (green bounding boxes) and predicted (red bounding boxes) object detections. (Right) The corresponding bird's-eye-view projection onto the HD map. The above and below scenarios represent two of our hypotheses: (a) Partial Visibility (Lane-following / Parallel-threat scenario) described in Sec. \ref{['sec:hypothesis_1']} and (b) Severe Yaw-Misalignment (intersection scenario) described in Sec. \ref{['sec:hypothesis_2']}. Black boxes denote the ego-agent, green boxes represent ground truth (GT), and red boxes represent predictions. We analyze objects within a 50m radius of the ego vehicle, aligning with the standard perception range evaluated on large-scale autonomous driving datasets ettinger2021largescaleinteractivemotioncaesar2020nuscenes, to investigate the potential impact of association errors on collision detection. Numerical CE, IoU, and CPD values presented below each camera image quantify the observed discrepancies in the metric.
  • Figure 2: Computation of the ego-centric Contour Error (CE) metric for MOT association. Bottom: Tracking scenario at 1s. Up: Predicted state at 3s. The proposed CE is computed symmetrically: (1) the maximum distance from the ground truth (GT) contour to the prediction ($\text{CE}_\text{GT-Pred}$), and (2) vice-versa ($\text{CE}_\text{Pred-gt}$). The final metric is max(($\text{CE}_\text{GT-Pred}$), ($\text{CE}_\text{Pred-gt}$)), ensuring consistency under occlusion and perspective change. We contrast CE with standard object-centric metrics (IoU, CPD). As shown in the yellow box, CE remains below the association threshold (green check) while a volumetric metric (IoU) and distance metric CPD) fail (red cross), demonstrating its effectiveness for ego-centric perception.
  • Figure 3: Scatter plots of IoU vs. CE for all matches within 5m, 10m, and 15m CE thresholds in pedestrian, car, and truck object categories, respectively. This figure illustrates that the majority of the matches rejected by IoU thresholds (considered functional failures) are not penalized by contour errors. The IoU thresholds are taken from the KITTI Benchmark geiger2012we.
  • Figure 4: Scatter plots of CPD vs. CE for all matches within 5m, 10m, and 15m CE thresholds in pedestrian, car, and truck object categories, respectively. Although this figure illustrates a positive correlation between CPDs and CEs, specific failure cases show that they are conceptually different (see Sec. \ref{['sec:temporal_analysis']}). The CPD thresholds are taken from nuScenes caesar2020nuscenes.
  • Figure 5: Qualitative comparison of our ego-centric Contour Error (CE) and object-centric metrics (IoU/CPD) in four different safety-critical interactions. Each row shows two seconds of motion (camera view + four 0.5s BEV HD-map time-series snapshots). Scenario (a) - Intersection + critical lane change (car): CE tracks the laterally cutting vehicle while IoU loses the match due to the perspective distortion. Scenario (b) - Urban driving + critical cut-in (car): CE preserves association during an aggressive lateral intrusion while IoU misclassifies the overlap. Scenario (c) - Intersection + Longitudinal criticality (truck): Under a high-speed longitudinal approach, CE remains stable while IoU degrades with scale and blur. Scenario (d) - Pedestrian crossing + critical parallel threat (truck): CE copes with sustained side-by-side occlusion while IoU fails under partial visibility. Metric text is colour-coded with green and red values representing TP and FP/FN, respectively. Ego-agent, ground truth, and predictions are represented by blue, green, and red bounding boxes, respectively.
  • ...and 1 more figures