Diffexplainer: Towards Cross-modal Global Explanations with Diffusion Models

Matteo Pennisi; Giovanni Bellitto; Simone Palazzo; Mubarak Shah; Concetto Spampinato

Diffexplainer: Towards Cross-modal Global Explanations with Diffusion Models

Matteo Pennisi, Giovanni Bellitto, Simone Palazzo, Mubarak Shah, Concetto Spampinato

TL;DR

DiffExplainer tackles the challenge of global model explainability by synthesizing multimodal explanations through a text-conditioned diffusion framework. It optimizes text embeddings to generate images that reveal what a classifier has learned, and uses language-driven segmentation to automatically identify core versus spurious features. The approach yields high-quality explanatory visuals, enables textual explanations, and uncovers biases such as dataset-induced associations, demonstrated through extensive experiments and a user study. This cross-modal methodology holds practical potential for auditing high-stakes models and enhancing transparency.

Abstract

We present DiffExplainer, a novel framework that, leveraging language-vision models, enables multimodal global explainability. DiffExplainer employs diffusion models conditioned on optimized text prompts, synthesizing images that maximize class outputs and hidden features of a classifier, thus providing a visual tool for explaining decisions. Moreover, the analysis of generated visual descriptions allows for automatic identification of biases and spurious features, as opposed to traditional methods that often rely on manual intervention. The cross-modal transferability of language-vision models also enables the possibility to describe decisions in a more human-interpretable way, i.e., through text. We conduct comprehensive experiments, which include an extensive user study, demonstrating the effectiveness of DiffExplainer on 1) the generation of high-quality images explaining model decisions, surpassing existing activation maximization methods, and 2) the automated identification of biases and spurious features.

Diffexplainer: Towards Cross-modal Global Explanations with Diffusion Models

TL;DR

Abstract

Paper Structure (13 sections, 6 equations, 10 figures)

This paper contains 13 sections, 6 equations, 10 figures.

Introduction
Related Work
Method
Latent Diffusion Models
Explanation Synthesis
Spurious Feature Discovery
Experimental Results
Qualitative comparison with Activation Maximization Methods
Automated Spurious Feature Discovery
User Study for Feature Preference
Towards Text Explanations
Ethnicity bias discovery
Conclusions

Figures (10)

Figure 1: DiffExplainer architecture. Top: an optimized soft prompt is fed to a text encoder for conditioning a LDM; in turn, this generates images that maximize a specific hidden feature or the output of a classifier. Bottom: automated discovery of spurious features based on the pixel-level segmentation of synthesized images.
Figure 2: Visual comparison of activation maximization strategies: Yosinski et al. yosinski2015understanding, Mahendran et al. Mahendran16, Nguyen et al. NguyenDYBC16 for a CaffeNet caffenet trained on ImageNet. DiffExplainer excels in generating higher-quality images compared to competitors.
Figure 3: Discovering classifier bias: generated samples for the CaffeNet classifier, for "dog sled", "bee" and "flagpole" classes. Nguyen et al.'s method NguyenDYBC16 produces images from the latent GAN distribution, which inherently reflect the dataset used for training rather than the features used to classify. Our approach, instead, generates samples where the main category is absent, thereby elucidating specific biases: "dog sled" is discerned through the presence of dogs, snow, and trees; "bee" through the presence of flowers; and "flagpole" through the presence of the US flag.
Figure 4: Examples showcasing the benefits of interpretability operating on text embeddings. By prefixing the soft prompts with "A texture of", and "A shape of", DiffExplainer synthesizes images that maximize the output for the specified class, revealing the textures and shapes most influential in the model's decision-making process.
Figure 5: Examples of agreement (top) and disagreement (bottom) between Salient ImageNet annotations and DiffExplainer-generated images. Each block displays the original sample in the first row followed by heatmaps from Salient ImageNet in second row, and images generated by DiffExplainer in the third row. Columns represent neural features: core features (in green) and spurious features (in red) according to Salient ImageNet.
...and 5 more figures

Diffexplainer: Towards Cross-modal Global Explanations with Diffusion Models

TL;DR

Abstract

Diffexplainer: Towards Cross-modal Global Explanations with Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Figures (10)