Table of Contents
Fetching ...

Model-agnostic Body Part Relevance Assessment for Pedestrian Detection

Maurice Günder, Sneha Banerjee, Rafet Sifa, Christian Bauckhage

TL;DR

This work targets explainability for pedestrian detection by adapting model-agnostic, sampling-based explanations to large CV models. It frames body-part relevance as semantic regions using a superpixel surrogate and KernelSHAP-inspired sampling, and introduces a continuous Beta-based sampling method with $ ext{Beta}( abla ext{weird})$ parameters to concentrate samples near $0$ and $1$, improving robustness with fewer samples ($eta$-sampling). Evaluations on RetinaNet50 trained on EuroCity Persons show that the Beta sampling yields comparable relevance patterns to KernelSHAP with far fewer samples, and the analysis highlights torso and head regions as primary drivers of pedestrian detection. Limitations include the reliance on BodyPix segmentation at street-scene resolutions and the lack of instance-level segmentation; the paper discusses simulation data as a path to richer, controlled analyses and broader applicability to scene understanding.

Abstract

Model-agnostic explanation methods for deep learning models are flexible regarding usability and availability. However, due to the fact that they can only manipulate input to see changes in output, they suffer from weak performance when used with complex model architectures. For models with large inputs as, for instance, in object detection, sampling-based methods like KernelSHAP are inefficient due to many computation-heavy forward passes through the model. In this work, we present a framework for using sampling-based explanation models in a computer vision context by body part relevance assessment for pedestrian detection. Furthermore, we introduce a novel sampling-based method similar to KernelSHAP that shows more robustness for lower sampling sizes and, thus, is more efficient for explainability analyses on large-scale datasets.

Model-agnostic Body Part Relevance Assessment for Pedestrian Detection

TL;DR

This work targets explainability for pedestrian detection by adapting model-agnostic, sampling-based explanations to large CV models. It frames body-part relevance as semantic regions using a superpixel surrogate and KernelSHAP-inspired sampling, and introduces a continuous Beta-based sampling method with parameters to concentrate samples near and , improving robustness with fewer samples (-sampling). Evaluations on RetinaNet50 trained on EuroCity Persons show that the Beta sampling yields comparable relevance patterns to KernelSHAP with far fewer samples, and the analysis highlights torso and head regions as primary drivers of pedestrian detection. Limitations include the reliance on BodyPix segmentation at street-scene resolutions and the lack of instance-level segmentation; the paper discusses simulation data as a path to richer, controlled analyses and broader applicability to scene understanding.

Abstract

Model-agnostic explanation methods for deep learning models are flexible regarding usability and availability. However, due to the fact that they can only manipulate input to see changes in output, they suffer from weak performance when used with complex model architectures. For models with large inputs as, for instance, in object detection, sampling-based methods like KernelSHAP are inefficient due to many computation-heavy forward passes through the model. In this work, we present a framework for using sampling-based explanation models in a computer vision context by body part relevance assessment for pedestrian detection. Furthermore, we introduce a novel sampling-based method similar to KernelSHAP that shows more robustness for lower sampling sizes and, thus, is more efficient for explainability analyses on large-scale datasets.
Paper Structure (19 sections, 5 equations, 7 figures)

This paper contains 19 sections, 5 equations, 7 figures.

Figures (7)

  • Figure 1: Concept overview of our approach to model-agnostic body part relevance assessment.
  • Figure 2: Comparison of our masking methods demonstrated on a pedestrian image from the EuroCity Persons dataset eurocitypersons.
  • Figure 3: Abstraction levels of our body part segmentation. The levels represent the granularity from detailed (\ref{['fig:abst_levels:0']}) to less detailed (\ref{['fig:abst_levels:3']}).
  • Figure 4: Plot of the Beta distribution (Equation \ref{['eq:beta_dist']}), the presence vectors of our sampling method are drawn from. The red dashed line shows the expectation value (mean) of the distribution.
  • Figure 5: Exemplary body part segmentation by BodyPix bodypix_github and corresponding body part relevance maps of KernelSHAP (middle plot) and our sampling method (second from right). Additionally, an error map for our method is shown in the rightmost plot.
  • ...and 2 more figures