Revisiting Evaluation of Deep Neural Networks for Pedestrian Detection
Patrick Feifel, Benedikt Franke, Frank Bonarens, Frank Köster, Arne Raulf, Friedhelm Schwenker
TL;DR
This paper revisits pedestrian-detection evaluation by introducing a segmentation-driven error taxonomy and new safety-focused metrics that separate false negatives by occlusion and foreground/background status, as well as false positives by scale, localization, and ghost detections. It presents a simple yet competitive generic pedestrian detector (GPD) framework with multiple backbones and three perception heads, trained on CityPersons and evaluated with a refined methodology that uses semantic and instance segmentation to identify occlusion states. The key contributions are the error-categorized evaluation, the filtered log-average miss rate (FLAMR) metrics, ghost-detection analysis, and an operating-point concept that aims to ensure zero misses for safety-critical foreground pedestrians. The findings show that FLAMR, especially with ghost-detection filtering, can reveal safety-relevant performance differences that traditional LAMR on the reasonable subset may miss, suggesting a more practical, application-oriented path for ADS perception development. Overall, the work advocates integrating segmentation-based error analysis into model evaluation to improve safety-critical pedestrian detection in automated driving contexts, supported by state-of-the-art results on CityPersons with a straightforward architecture.
Abstract
Reliable pedestrian detection represents a crucial step towards automated driving systems. However, the current performance benchmarks exhibit weaknesses. The currently applied metrics for various subsets of a validation dataset prohibit a realistic performance evaluation of a DNN for pedestrian detection. As image segmentation supplies fine-grained information about a street scene, it can serve as a starting point to automatically distinguish between different types of errors during the evaluation of a pedestrian detector. In this work, eight different error categories for pedestrian detection are proposed and new metrics are proposed for performance comparison along these error categories. We use the new metrics to compare various backbones for a simplified version of the APD, and show a more fine-grained and robust way to compare models with each other especially in terms of safety-critical performance. We achieve SOTA on CityPersons-reasonable (without extra training data) by using a rather simple architecture.
