Explaining Object Detectors via Collective Contribution of Pixels

Toshinori Yamauchi; Hiroshi Kera; Kazuhiko Kawamoto

Explaining Object Detectors via Collective Contribution of Pixels

Toshinori Yamauchi, Hiroshi Kera, Kazuhiko Kawamoto

TL;DR

This paper tackles the challenge of explaining object detectors by accounting for collective pixel contributions rather than treating pixels independently. It introduces VX-CODE, a greedy, patch-based explanation method that combines Shapley values and interactions to capture both individual and joint pixel influences on bounding-box localization and class decisions, supported by the novel pi*-index for sequential coalitional analysis. Through extensive experiments on DETR and Faster R-CNN across COCO and VOC, VX-CODE achieves higher insertion/deletion AUC than state-of-the-art baselines, with notable gains as the patch group size r increases, and demonstrates robustness to bias, failure cases, and adaptation to object-level foundation models like Grounding DINO. The work provides a practical, theoretically grounded framework for faithful explanations of detectors, enabling better model debugging, bias detection, and interpretability in safety-critical settings.

Abstract

Visual explanations for object detectors are crucial for enhancing their reliability. Object detectors identify and localize instances by assessing multiple visual features collectively. When generating explanations, overlooking these collective influences in detections may lead to missing compositional cues or capturing spurious correlations. However, existing methods typically focus solely on individual pixel contributions, neglecting the collective contribution of multiple pixels. To address this limitation, we propose a game-theoretic method based on Shapley values and interactions to explicitly capture both individual and collective pixel contributions. Our method provides explanations for both bounding box localization and class determination, highlighting regions crucial for detection. Extensive experiments demonstrate that the proposed method identifies important regions more accurately than state-of-the-art methods. The code will be publicly available soon.

Explaining Object Detectors via Collective Contribution of Pixels

TL;DR

Abstract

Explaining Object Detectors via Collective Contribution of Pixels

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (21)

Theorems & Definitions (8)