Table of Contents
Fetching ...

Revisiting Evaluation of Deep Neural Networks for Pedestrian Detection

Patrick Feifel, Benedikt Franke, Frank Bonarens, Frank Köster, Arne Raulf, Friedhelm Schwenker

TL;DR

This paper revisits pedestrian-detection evaluation by introducing a segmentation-driven error taxonomy and new safety-focused metrics that separate false negatives by occlusion and foreground/background status, as well as false positives by scale, localization, and ghost detections. It presents a simple yet competitive generic pedestrian detector (GPD) framework with multiple backbones and three perception heads, trained on CityPersons and evaluated with a refined methodology that uses semantic and instance segmentation to identify occlusion states. The key contributions are the error-categorized evaluation, the filtered log-average miss rate (FLAMR) metrics, ghost-detection analysis, and an operating-point concept that aims to ensure zero misses for safety-critical foreground pedestrians. The findings show that FLAMR, especially with ghost-detection filtering, can reveal safety-relevant performance differences that traditional LAMR on the reasonable subset may miss, suggesting a more practical, application-oriented path for ADS perception development. Overall, the work advocates integrating segmentation-based error analysis into model evaluation to improve safety-critical pedestrian detection in automated driving contexts, supported by state-of-the-art results on CityPersons with a straightforward architecture.

Abstract

Reliable pedestrian detection represents a crucial step towards automated driving systems. However, the current performance benchmarks exhibit weaknesses. The currently applied metrics for various subsets of a validation dataset prohibit a realistic performance evaluation of a DNN for pedestrian detection. As image segmentation supplies fine-grained information about a street scene, it can serve as a starting point to automatically distinguish between different types of errors during the evaluation of a pedestrian detector. In this work, eight different error categories for pedestrian detection are proposed and new metrics are proposed for performance comparison along these error categories. We use the new metrics to compare various backbones for a simplified version of the APD, and show a more fine-grained and robust way to compare models with each other especially in terms of safety-critical performance. We achieve SOTA on CityPersons-reasonable (without extra training data) by using a rather simple architecture.

Revisiting Evaluation of Deep Neural Networks for Pedestrian Detection

TL;DR

This paper revisits pedestrian-detection evaluation by introducing a segmentation-driven error taxonomy and new safety-focused metrics that separate false negatives by occlusion and foreground/background status, as well as false positives by scale, localization, and ghost detections. It presents a simple yet competitive generic pedestrian detector (GPD) framework with multiple backbones and three perception heads, trained on CityPersons and evaluated with a refined methodology that uses semantic and instance segmentation to identify occlusion states. The key contributions are the error-categorized evaluation, the filtered log-average miss rate (FLAMR) metrics, ghost-detection analysis, and an operating-point concept that aims to ensure zero misses for safety-critical foreground pedestrians. The findings show that FLAMR, especially with ghost-detection filtering, can reveal safety-relevant performance differences that traditional LAMR on the reasonable subset may miss, suggesting a more practical, application-oriented path for ADS perception development. Overall, the work advocates integrating segmentation-based error analysis into model evaluation to improve safety-critical pedestrian detection in automated driving contexts, supported by state-of-the-art results on CityPersons with a straightforward architecture.

Abstract

Reliable pedestrian detection represents a crucial step towards automated driving systems. However, the current performance benchmarks exhibit weaknesses. The currently applied metrics for various subsets of a validation dataset prohibit a realistic performance evaluation of a DNN for pedestrian detection. As image segmentation supplies fine-grained information about a street scene, it can serve as a starting point to automatically distinguish between different types of errors during the evaluation of a pedestrian detector. In this work, eight different error categories for pedestrian detection are proposed and new metrics are proposed for performance comparison along these error categories. We use the new metrics to compare various backbones for a simplified version of the APD, and show a more fine-grained and robust way to compare models with each other especially in terms of safety-critical performance. We achieve SOTA on CityPersons-reasonable (without extra training data) by using a rather simple architecture.

Paper Structure

This paper contains 29 sections, 10 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: State-of-the-art DNNs for pedestrian detection are benchmarked with the log-average miss rate on the reasonable subset of the CityPersons validation dataset (left). From a safety perspective, particularly safety-critical pedestrians, such as the one standing directly in front of the automated vehicle, must be included in the evaluation and not be ignored. Our proposed error categories (right) correctly distinguish between foreground and background, among others. Based on them, we perform an application-oriented performance evaluation of DNNs for pedestrian detection.
  • Figure 2: Incorrectly ignored bounding boxes from the reasonable subset of CityPersons are recovered by our proposed error categories.
  • Figure 3: Incorrectly ignored bounding boxes from the bare subset of CityPersons are re-grouped to background.
  • Figure 4: Categories for ground truth bounding boxes: foreground $\mathop{\mathrm{\mathcal{F}}}\nolimits$, background $\mathop{\mathrm{\mathcal{B}}}\nolimits$, environmental occlusion $\mathop{\mathrm{\mathcal{E}}}\nolimits$, crowd occlusion $\mathop{\mathrm{\mathcal{C}}}\nolimits$, ambiguous occlusion $\mathop{\mathrm{\mathcal{A}}}\nolimits$. Ignored bounding boxes $\mathop{\mathrm{\mathcal{I}^G}}\nolimits$ are not part of the evaluation.
  • Figure 5: Categories for detection bounding boxes: true positives $\text{TP}^D$ (solid), ghost detections $\mathop{\mathrm{\mathcal{H}}}\nolimits$ (dash dotted), localization errors $\mathop{\mathrm{\mathcal{L}}}\nolimits$ (dashed) scale errors $\mathop{\mathrm{\mathcal{S}}}\nolimits$ (dotted) and ignored detections $\mathop{\mathrm{\mathcal{I}^D}}\nolimits$ (solid).
  • ...and 5 more figures