Table of Contents
Fetching ...

GIFT: A Framework for Global Interpretable Faithful Textual Explanations of Vision Classifiers

Éloi Zablocki, Valentin Gerard, Amaia Cardiel, Eric Gaussier, Matthieu Cord, Eduardo Valle

TL;DR

GIFT tackles the challenge of producing global, faithful textual explanations for vision classifiers by chaining local, faithful counterfactuals into change captions, aggregating those signals with an LLM to form global hypotheses, and rigorously verifying candidates with causal metrics via image interventions. The four-stage framework—local counterfactuals, change-captioning, global hypothesis generation, and causal verification—yields interpretable explanations that generalize across diverse domains (CLEVR, CelebA, BDD-OIA) and reveal both rules and biases in vision models. The method introduces CaCE and $\hat{\text{PNS}}$ as complementary causal metrics and demonstrates the necessity of stage 2 change-captioning and stage 4 verification for faithful explanations. The work advances practical interpretability in safety-critical settings and provides a reusable pipeline and codebase to broaden applicability and bias/failure analysis in complex vision systems.

Abstract

Understanding deep models is crucial for deploying them in safety-critical applications. We introduce GIFT, a framework for deriving post-hoc, global, interpretable, and faithful textual explanations for vision classifiers. GIFT starts from local faithful visual counterfactual explanations and employs (vision) language models to translate those into global textual explanations. Crucially, GIFT provides a verification stage measuring the causal effect of the proposed explanations on the classifier decision. Through experiments across diverse datasets, including CLEVR, CelebA, and BDD, we demonstrate that GIFT effectively reveals meaningful insights, uncovering tasks, concepts, and biases used by deep vision classifiers. The framework is released at https://github.com/valeoai/GIFT.

GIFT: A Framework for Global Interpretable Faithful Textual Explanations of Vision Classifiers

TL;DR

GIFT tackles the challenge of producing global, faithful textual explanations for vision classifiers by chaining local, faithful counterfactuals into change captions, aggregating those signals with an LLM to form global hypotheses, and rigorously verifying candidates with causal metrics via image interventions. The four-stage framework—local counterfactuals, change-captioning, global hypothesis generation, and causal verification—yields interpretable explanations that generalize across diverse domains (CLEVR, CelebA, BDD-OIA) and reveal both rules and biases in vision models. The method introduces CaCE and as complementary causal metrics and demonstrates the necessity of stage 2 change-captioning and stage 4 verification for faithful explanations. The work advances practical interpretability in safety-critical settings and provides a reusable pipeline and codebase to broaden applicability and bias/failure analysis in complex vision systems.

Abstract

Understanding deep models is crucial for deploying them in safety-critical applications. We introduce GIFT, a framework for deriving post-hoc, global, interpretable, and faithful textual explanations for vision classifiers. GIFT starts from local faithful visual counterfactual explanations and employs (vision) language models to translate those into global textual explanations. Crucially, GIFT provides a verification stage measuring the causal effect of the proposed explanations on the classifier decision. Through experiments across diverse datasets, including CLEVR, CelebA, and BDD, we demonstrate that GIFT effectively reveals meaningful insights, uncovering tasks, concepts, and biases used by deep vision classifiers. The framework is released at https://github.com/valeoai/GIFT.

Paper Structure

This paper contains 53 sections, 18 equations, 14 figures, 7 tables.

Figures (14)

  • Figure 1: Overview of GIFT. Given a classifier $M$ (here discriminating images with a 'red metal object'), GIFT extracts explanations in four stages: Stage 1 generates local visual counterfactual explanations for several images. The counterfactuals are by nature faithful to the classifier as they reveal semantic and minimal changes to the query images that flip the classifier's output. Stage 2 translates in natural language the visual differences between original and counterfactual images with an image change captioning model ; this enhances interpretability but risks introducing potential noise. Stage 3 applies an LLM to aggregate local explanations into candidate global explanations ; this disambiguates local evidences but is brittle to LLM hallucinations. Lastly, Stage 4 filters out or validates these global explanations with intervention studies, to ensure faithfulness with respect to the classifier.
  • Figure 2: Causal interventions, Stage 4. For a candidate explanation $e$ (e.g., 'class 1 = presence of a red object'), we use an image-editing model to add or remove the underlying concept $c_e$ (e.g., 'red object') and observe the impact on the classification outcome, which is aggregated to compute the CaCE (Eq. \ref{['eq:cace']}) and $\hat{\text{PNS}}$ (Eq. \ref{['eq:pns']}). In the example, the classifier $M$ recognizes images with a 'red metal object', and we observe, as expected, a partial causal effect: removing red objects impacts the outcome, but inserting non-metal red ones does not.
  • Figure 3: Samples from the intervention study on the CelebA-'Old' classifier. The combined concepts under scrutiny are: 'Glasses', 'Wrinkles on Forehead', 'Wrinkles around Eyes'. Each pair has the query image on the left and the edition on the right.
  • Figure 4: GIFT output for the biased classifier on BDD-OIA xu2020bddoia. The model $M$ classifying images into 'Can/Cannot turn right' is intentionally biased for vehicles in the left lane to yield 'Cannot turn right'. We illustrate the output of Stages 1 and 2 for a single randomly selected input and the global output of Stages 3 and 4. Causal metrics are used when $\text{DI} \geq 15\%$. DI, CaCE and $\hat{\text{PNS}}$ in %.
  • Figure 5: Image intervention for the 'Dense Traffic in Left Lane' explanation. The model is biased for vehicles in the left lanes to yield the 'Cannot turn right' output. Left: query images; Right: images $x^{+c}$ and $x^{-c}$ with the added and removed concept.
  • ...and 9 more figures