Explaining Image Classifiers

Hana Chockler; Joseph Y. Halpern

Explaining Image Classifiers

Hana Chockler, Joseph Y. Halpern

TL;DR

This paper addresses how to explain image classifiers through the Halpern-Pearl causal framework, critiquing MMTS for misaligning with Halpern's definitions and proposing to use the full causal definition to handle absence and rare events. It models classifiers as probabilistic causal systems and analyzes how MMTS's two-layer, pixel-only dependencies relate to Halpern's actual causation and explanation concepts, including the role of context sets K and goodness measures. The authors argue that Halpern's definition can subsume MMTS's insights while providing richer explanations, and they discuss extending explanations to negative outcomes and rare events with domain knowledge to improve tractability. The work aims to improve the theoretical grounding and practical quality of image-classifier explanations, enabling robust, domain-informed interpretations beyond positive-label predictions and standard feature attributions.

Abstract

We focus on explaining image classifiers, taking the work of Mothilal et al. [2021] (MMTS) as our point of departure. We observe that, although MMTS claim to be using the definition of explanation proposed by Halpern [2016], they do not quite do so. Roughly speaking, Halpern's definition has a necessity clause and a sufficiency clause. MMTS replace the necessity clause by a requirement that, as we show, implies it. Halpern's definition also allows agents to restrict the set of options considered. While these difference may seem minor, as we show, they can have a nontrivial impact on explanations. We also show that, essentially without change, Halpern's definition can handle two issues that have proved difficult for other approaches: explanations of absence (when, for example, an image classifier for tumors outputs "no tumor") and explanations of rare events (such as tumors).

Explaining Image Classifiers

TL;DR

Abstract

Explaining Image Classifiers

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (12)