Table of Contents
Fetching ...

COMIX: Compositional Explanations using Prototypes

Sarath Sivaprasad, Dmitry Kangin, Plamen Angelov, Mario Fritz

TL;DR

COMiX addresses the interpretability gap by offering by-design explanations that faithfully reflect the model's decision process. It builds a B-Cos-based encoder to obtain interpretable embeddings, selects a small set of class-defining features via mutual information and pseudo-labels, and retrieves prototypical training-region matches to justify predictions through per-feature prototype explanations and majority voting. The approach yields high fidelity and sparsity, demonstrates strong interpretability metrics (including a notable $48.82\%$ improvement in C-insertion on ImageNet), and exhibits competitive accuracy across diverse datasets with potential zero-shot generalization. By linking test decisions directly to training data, COMiX enables both factual and counterfactual interpretations, and opens avenues for segmentation and safety-critical deployments, while maintaining reproducibility through detailed appendices and forthcoming code release.

Abstract

Aligning machine representations with human understanding is key to improving interpretability of machine learning (ML) models. When classifying a new image, humans often explain their decisions by decomposing the image into concepts and pointing to corresponding regions in familiar images. Current ML explanation techniques typically either trace decision-making processes to reference prototypes, generate attribution maps highlighting feature importance, or incorporate intermediate bottlenecks designed to align with human-interpretable concepts. The proposed method, named COMIX, classifies an image by decomposing it into regions based on learned concepts and tracing each region to corresponding ones in images from the training dataset, assuring that explanations fully represent the actual decision-making process. We dissect the test image into selected internal representations of a neural network to derive prototypical parts (primitives) and match them with the corresponding primitives derived from the training data. In a series of qualitative and quantitative experiments, we theoretically prove and demonstrate that our method, in contrast to post hoc analysis, provides fidelity of explanations and shows that the efficiency is competitive with other inherently interpretable architectures. Notably, it shows substantial improvements in fidelity and sparsity metrics, including 48.82% improvement in the C-insertion score on the ImageNet dataset over the best state-of-the-art baseline.

COMIX: Compositional Explanations using Prototypes

TL;DR

COMiX addresses the interpretability gap by offering by-design explanations that faithfully reflect the model's decision process. It builds a B-Cos-based encoder to obtain interpretable embeddings, selects a small set of class-defining features via mutual information and pseudo-labels, and retrieves prototypical training-region matches to justify predictions through per-feature prototype explanations and majority voting. The approach yields high fidelity and sparsity, demonstrates strong interpretability metrics (including a notable improvement in C-insertion on ImageNet), and exhibits competitive accuracy across diverse datasets with potential zero-shot generalization. By linking test decisions directly to training data, COMiX enables both factual and counterfactual interpretations, and opens avenues for segmentation and safety-critical deployments, while maintaining reproducibility through detailed appendices and forthcoming code release.

Abstract

Aligning machine representations with human understanding is key to improving interpretability of machine learning (ML) models. When classifying a new image, humans often explain their decisions by decomposing the image into concepts and pointing to corresponding regions in familiar images. Current ML explanation techniques typically either trace decision-making processes to reference prototypes, generate attribution maps highlighting feature importance, or incorporate intermediate bottlenecks designed to align with human-interpretable concepts. The proposed method, named COMIX, classifies an image by decomposing it into regions based on learned concepts and tracing each region to corresponding ones in images from the training dataset, assuring that explanations fully represent the actual decision-making process. We dissect the test image into selected internal representations of a neural network to derive prototypical parts (primitives) and match them with the corresponding primitives derived from the training data. In a series of qualitative and quantitative experiments, we theoretically prove and demonstrate that our method, in contrast to post hoc analysis, provides fidelity of explanations and shows that the efficiency is competitive with other inherently interpretable architectures. Notably, it shows substantial improvements in fidelity and sparsity metrics, including 48.82% improvement in the C-insertion score on the ImageNet dataset over the best state-of-the-art baseline.
Paper Structure (37 sections, 1 theorem, 15 equations, 13 figures, 7 tables, 1 algorithm)

This paper contains 37 sections, 1 theorem, 15 equations, 13 figures, 7 tables, 1 algorithm.

Key Result

Theorem 1

Assume $\tilde{g}(\mathbf{x}; \theta)= {g}(\mathbf{x}; \theta)\ \forall {g}(\mathbf{x}; \theta) \in {G}(\mathbf{x}; \theta)$. Then the explanation $E(\mathbf{x}; \mathcal{D})$ is sufficient for the prediction $G(\mathbf{x}; \theta, \mathcal{D})$ according to Algorithm alg:one.

Figures (13)

  • Figure 1: Humans often make sense of new or complex objects by comparing their parts to previously encountered prototypes (smith1974structure). For example, when describing something unfamiliar, people tend to point out resemblances between parts of the new object and familiar prototypes by stating that ‘this part of the object looks like that other one I have seen before’. We propose a method to classify an image by decomposing it into regions based on learned concepts and tracing each region to the corresponding regions in images from training datasets. We refer to such interpretations as to 'COMiX panels'
  • Figure 2: COMiX method overview.
  • Figure 3: Examples of COMiX panel interpretations for Oxford-IIIT Pets (left) and CUB-200-211 dataset (right).
  • Figure 4: Final prediction vs pseudo-label confusion matrix on Oxford-IIIT Pets dataset
  • Figure 5: Interpretation for a sample image from the Oxford-IIIT Pets dataset: the model correctly classifies the input image as 'Bombay cat'. This visualization demonstrates the similarity between the test image and seven training images of the 'Bombay cat' class and one image of a boxer dog (highlighted in red), offering insight into the model's decision-making process.
  • ...and 8 more figures

Theorems & Definitions (2)

  • Theorem 1
  • proof