Explaining in Diffusion: Explaining a Classifier Through Hierarchical Semantics with Text-to-Image Diffusion Models
Tahira Kazimi, Ritika Allada, Pinar Yanardag
TL;DR
DiffEx presents a training-free framework that explains classifier decisions by editing images with diffusion models guided by a hierarchical semantic corpus mined from vision-language models. It leverages a beam-search-style strategy over semantic paths to identify both coarse and fine-grained attributes that influence logits, enabling explanations for single-concept and complex-scene classifiers. The approach is demonstrated across diverse domains (faces, birds, plant health, retina, fashion, etc.) and shows superior interpretability and disentanglement compared with Grad-CAM and StylEx, while providing richer, hierarchical semantics. The work highlights diffusion models as a versatile tool for model transparency with practical impact in high-stakes settings where understanding decision factors is crucial.
Abstract
Classifiers are important components in many computer vision tasks, serving as the foundational backbone of a wide variety of models employed across diverse applications. However, understanding the decision-making process of classifiers remains a significant challenge. We propose DiffEx, a novel method that leverages the capabilities of text-to-image diffusion models to explain classifier decisions. Unlike traditional GAN-based explainability models, which are limited to simple, single-concept analyses and typically require training a new model for each classifier, our approach can explain classifiers that focus on single concepts (such as faces or animals) as well as those that handle complex scenes involving multiple concepts. DiffEx employs vision-language models to create a hierarchical list of semantics, allowing users to identify not only the overarching semantic influences on classifiers (e.g., the 'beard' semantic in a facial classifier) but also their sub-types, such as 'goatee' or 'Balbo' beard. Our experiments demonstrate that DiffEx is able to cover a significantly broader spectrum of semantics compared to its GAN counterparts, providing a hierarchical tool that delivers a more detailed and fine-grained understanding of classifier decisions.
