Table of Contents
Fetching ...

InfoDisent: Explainability of Image Classification Models by Information Disentanglement

Łukasz Struski, Dawid Rymarczyk, Jacek Tabor

TL;DR

InfoDisent tackles the Explainable AI gap by uniting the flexibility of post-hoc explanations with the interpretability of prototypical concepts through information disentanglement. It introduces a novel architecture that freezes a pretrained backbone and learns an orthogonal channel transform, sparse pooling, and a nonnegative final classifier to produce atomic, prototypical parts that can be visualized as patches and heatmaps. The method yields local and global explanations, supports both positive and negative contributions, and generalizes prototypical explanations to large-scale datasets like ImageNet. Empirical results across datasets and user studies show competitive accuracy and strong human interpretability, with statistically significant improvements in user understanding and disambiguation compared to baselines. Overall, InfoDisent provides a scalable, backbone-agnostic XAI framework with practical implications for deploying interpretable models in real-world, high-stakes settings.

Abstract

In this work, we introduce InfoDisent, a hybrid approach to explainability based on the information bottleneck principle. InfoDisent enables the disentanglement of information in the final layer of any pretrained model into atomic concepts, which can be interpreted as prototypical parts. This approach merges the flexibility of post-hoc methods with the concept-level modeling capabilities of self-explainable neural networks, such as ProtoPNets. We demonstrate the effectiveness of InfoDisent through computational experiments and user studies across various datasets using modern backbones such as ViTs and convolutional networks. Notably, InfoDisent generalizes the prototypical parts approach to novel domains (ImageNet).

InfoDisent: Explainability of Image Classification Models by Information Disentanglement

TL;DR

InfoDisent tackles the Explainable AI gap by uniting the flexibility of post-hoc explanations with the interpretability of prototypical concepts through information disentanglement. It introduces a novel architecture that freezes a pretrained backbone and learns an orthogonal channel transform, sparse pooling, and a nonnegative final classifier to produce atomic, prototypical parts that can be visualized as patches and heatmaps. The method yields local and global explanations, supports both positive and negative contributions, and generalizes prototypical explanations to large-scale datasets like ImageNet. Empirical results across datasets and user studies show competitive accuracy and strong human interpretability, with statistically significant improvements in user understanding and disambiguation compared to baselines. Overall, InfoDisent provides a scalable, backbone-agnostic XAI framework with practical implications for deploying interpretable models in real-world, high-stakes settings.

Abstract

In this work, we introduce InfoDisent, a hybrid approach to explainability based on the information bottleneck principle. InfoDisent enables the disentanglement of information in the final layer of any pretrained model into atomic concepts, which can be interpreted as prototypical parts. This approach merges the flexibility of post-hoc methods with the concept-level modeling capabilities of self-explainable neural networks, such as ProtoPNets. We demonstrate the effectiveness of InfoDisent through computational experiments and user studies across various datasets using modern backbones such as ViTs and convolutional networks. Notably, InfoDisent generalizes the prototypical parts approach to novel domains (ImageNet).
Paper Structure (37 sections, 6 equations, 27 figures, 8 tables)

This paper contains 37 sections, 6 equations, 27 figures, 8 tables.

Figures (27)

  • Figure 1: Decision explanation constructed by InfoDisent for the pre-trained ViT feature space on the Agaric mushrooms image from the ImageNet. We can trace the decision of ViT behind assigning the class Agaric to the image on the left to having a hat (569), a white leg (728), a reddish shine (552), a strawberry texture (297) and the appearance of ground with moss (311). Note that, the prototype block (right) each row represents the prototypical part (the corresponding channel number). The yellow boxes in each row show the activation of a given prototypical part, while in the first column, we show the activation of corresponding prototypical parts in the original image.
  • Figure 2: Our image classification interpretation model, InfoDisent, features three main components: a pre-trained backbone, a pooling layer for key features, and a fully connected layer. The CNN/transformer backbone, with frozen weights, is not further trained. The pooling layer extracts features from the last transformer or convolutional layer and identifies key positive and negative features. These are then combined into a dense vector, which is processed by a fully connected linear layer with nonnegative entries in the final stage.
  • Figure 3: The image shows prototypes from channels 689 to 692 in a trained ResNet-50 on the ImageNet. Each row displays the 5 most significant patches from a single prototypical channel. The prototype's activations are highlighted by yellow boxes.
  • Figure 4: Exemplary explanation (hen) for ResNet-50 backbone provided by InfoDisent in a form of prototypical parts.
  • Figure 5: The image demonstrates how to analyze and visualize decisions made by InfoDisent.
  • ...and 22 more figures