Table of Contents
Fetching ...

Explaining in Diffusion: Explaining a Classifier Through Hierarchical Semantics with Text-to-Image Diffusion Models

Tahira Kazimi, Ritika Allada, Pinar Yanardag

TL;DR

DiffEx presents a training-free framework that explains classifier decisions by editing images with diffusion models guided by a hierarchical semantic corpus mined from vision-language models. It leverages a beam-search-style strategy over semantic paths to identify both coarse and fine-grained attributes that influence logits, enabling explanations for single-concept and complex-scene classifiers. The approach is demonstrated across diverse domains (faces, birds, plant health, retina, fashion, etc.) and shows superior interpretability and disentanglement compared with Grad-CAM and StylEx, while providing richer, hierarchical semantics. The work highlights diffusion models as a versatile tool for model transparency with practical impact in high-stakes settings where understanding decision factors is crucial.

Abstract

Classifiers are important components in many computer vision tasks, serving as the foundational backbone of a wide variety of models employed across diverse applications. However, understanding the decision-making process of classifiers remains a significant challenge. We propose DiffEx, a novel method that leverages the capabilities of text-to-image diffusion models to explain classifier decisions. Unlike traditional GAN-based explainability models, which are limited to simple, single-concept analyses and typically require training a new model for each classifier, our approach can explain classifiers that focus on single concepts (such as faces or animals) as well as those that handle complex scenes involving multiple concepts. DiffEx employs vision-language models to create a hierarchical list of semantics, allowing users to identify not only the overarching semantic influences on classifiers (e.g., the 'beard' semantic in a facial classifier) but also their sub-types, such as 'goatee' or 'Balbo' beard. Our experiments demonstrate that DiffEx is able to cover a significantly broader spectrum of semantics compared to its GAN counterparts, providing a hierarchical tool that delivers a more detailed and fine-grained understanding of classifier decisions.

Explaining in Diffusion: Explaining a Classifier Through Hierarchical Semantics with Text-to-Image Diffusion Models

TL;DR

DiffEx presents a training-free framework that explains classifier decisions by editing images with diffusion models guided by a hierarchical semantic corpus mined from vision-language models. It leverages a beam-search-style strategy over semantic paths to identify both coarse and fine-grained attributes that influence logits, enabling explanations for single-concept and complex-scene classifiers. The approach is demonstrated across diverse domains (faces, birds, plant health, retina, fashion, etc.) and shows superior interpretability and disentanglement compared with Grad-CAM and StylEx, while providing richer, hierarchical semantics. The work highlights diffusion models as a versatile tool for model transparency with practical impact in high-stakes settings where understanding decision factors is crucial.

Abstract

Classifiers are important components in many computer vision tasks, serving as the foundational backbone of a wide variety of models employed across diverse applications. However, understanding the decision-making process of classifiers remains a significant challenge. We propose DiffEx, a novel method that leverages the capabilities of text-to-image diffusion models to explain classifier decisions. Unlike traditional GAN-based explainability models, which are limited to simple, single-concept analyses and typically require training a new model for each classifier, our approach can explain classifiers that focus on single concepts (such as faces or animals) as well as those that handle complex scenes involving multiple concepts. DiffEx employs vision-language models to create a hierarchical list of semantics, allowing users to identify not only the overarching semantic influences on classifiers (e.g., the 'beard' semantic in a facial classifier) but also their sub-types, such as 'goatee' or 'Balbo' beard. Our experiments demonstrate that DiffEx is able to cover a significantly broader spectrum of semantics compared to its GAN counterparts, providing a hierarchical tool that delivers a more detailed and fine-grained understanding of classifier decisions.

Paper Structure

This paper contains 29 sections, 2 equations, 14 figures, 9 tables, 1 algorithm.

Figures (14)

  • Figure 1: DiffEx explains the decisions of domain-specific classifiers by identifying the most influential semantics affecting their predictions. Classifier scores for each example are displayed in the top-left corner, demonstrating how classifier predictions change in response to the manipulation of different semantics (original images are shown with red borders). Our approach is capable of explaining classifiers that concentrate on individual concepts such as faces or animals (top row) as well as those that manage complex scenes involving multiple objects, such as a formal/casual fit in a fashion context (bottom row).
  • Figure 2: Hierarchical List of Attributes for the Bird Domain. We use VLMs to extract a hierarchical corpus of semantics within a given domain. This structured representation helps to illustrate how different attributes are grouped and their relationships within the broader domain, facilitating a better understanding of how each semantic contributes to the overall decision-making process of a classifier.
  • Figure 3: An Overview of DiffEx. Our pipeline processes a set of sample domain-specific images and a text prompt using a VLM to generate a hierarchical semantic corpus of attributes relevant to a specific domain. Based on this corpus, DiffEx identifies and ranks the most influential features affecting the classifier’s decisions, sorting them from most to least impactful (rightmost image). The hierarchical explanation of semantics (such as beard and its subtypes) provides a fine-grained understanding of which features drive classifier outputs.
  • Figure 4: Top-7 Discovered Facial Attributes for the Age Classifier. DiffEx identifies key attributes and their top hierarchical subtypes for a perceived age classifier in the facial domain. For each attribute, the edited images and their respective subtypes are displayed in a hierarchical structure, outlined with a black border. The score for the "young" label is shown in the top-left corner of each image.
  • Figure 5: Top-7 Discovered Attributes Across Different Animal Domains. Our method successfully identifies key attributes for multiple domains, such as bird, wildcat, and pet species. The original images are depicted with red borders while the edited images are depicted with black borders. For the pet species domain, we used a binary classifier and for the bird and wildcat species domains, we used a multi-classifier. The impact of each attribute on the classifier's score is shown in the top-left corner of each image. For attributes that caused subtle changes, we provided a zoomed-in view of the edit, displayed in the bottom left or right corner of the image.
  • ...and 9 more figures