Table of Contents
Fetching ...

Formal Abductive Latent Explanations for Prototype-Based Networks

Jules Soria, Zakaria Chihani, Julien Girard-Satabin, Alban Grastien, Romain Xu-Darme, Daniela Cancila

TL;DR

Prototype-based networks often rely on intuitive prototype explanations that can be misleading. This work introduces Abductive Latent Explanations (ALE), a formal FXAI framework that defines latent-space preconditions $\phi_{\mathcal{E}}$ over $f(\mathbf{x})$ to guarantee the prediction, and provides solver-free, geometry-based algorithms (domain/spatial constraints, triangular inequality, and hypersphere intersection) to compute verifiable ALEs in the latent space. Across multiple datasets and backbones, ALEs reveal that achieving formal guarantees typically requires large, many-prototype explanations, with larger ALEs correlating with incorrect predictions and potentially signaling uncertainty. These findings improve explanation reliability for safety-critical use and highlight fundamental trade-offs between interpretability and fidelity in prototype-based models.

Abstract

Case-based reasoning networks are machine-learning models that make predictions based on similarity between the input and prototypical parts of training samples, called prototypes. Such models are able to explain each decision by pointing to the prototypes that contributed the most to the final outcome. As the explanation is a core part of the prediction, they are often qualified as ``interpretable by design". While promising, we show that such explanations are sometimes misleading, which hampers their usefulness in safety-critical contexts. In particular, several instances may lead to different predictions and yet have the same explanation. Drawing inspiration from the field of formal eXplainable AI (FXAI), we propose Abductive Latent Explanations (ALEs), a formalism to express sufficient conditions on the intermediate (latent) representation of the instance that imply the prediction. Our approach combines the inherent interpretability of case-based reasoning models and the guarantees provided by formal XAI. We propose a solver-free and scalable algorithm for generating ALEs based on three distinct paradigms, compare them, and present the feasibility of our approach on diverse datasets for both standard and fine-grained image classification. The associated code can be found at https://github.com/julsoria/ale

Formal Abductive Latent Explanations for Prototype-Based Networks

TL;DR

Prototype-based networks often rely on intuitive prototype explanations that can be misleading. This work introduces Abductive Latent Explanations (ALE), a formal FXAI framework that defines latent-space preconditions over to guarantee the prediction, and provides solver-free, geometry-based algorithms (domain/spatial constraints, triangular inequality, and hypersphere intersection) to compute verifiable ALEs in the latent space. Across multiple datasets and backbones, ALEs reveal that achieving formal guarantees typically requires large, many-prototype explanations, with larger ALEs correlating with incorrect predictions and potentially signaling uncertainty. These findings improve explanation reliability for safety-critical use and highlight fundamental trade-offs between interpretability and fidelity in prototype-based models.

Abstract

Case-based reasoning networks are machine-learning models that make predictions based on similarity between the input and prototypical parts of training samples, called prototypes. Such models are able to explain each decision by pointing to the prototypes that contributed the most to the final outcome. As the explanation is a core part of the prediction, they are often qualified as ``interpretable by design". While promising, we show that such explanations are sometimes misleading, which hampers their usefulness in safety-critical contexts. In particular, several instances may lead to different predictions and yet have the same explanation. Drawing inspiration from the field of formal eXplainable AI (FXAI), we propose Abductive Latent Explanations (ALEs), a formalism to express sufficient conditions on the intermediate (latent) representation of the instance that imply the prediction. Our approach combines the inherent interpretability of case-based reasoning models and the guarantees provided by formal XAI. We propose a solver-free and scalable algorithm for generating ALEs based on three distinct paradigms, compare them, and present the feasibility of our approach on diverse datasets for both standard and fine-grained image classification. The associated code can be found at https://github.com/julsoria/ale

Paper Structure

This paper contains 25 sections, 4 theorems, 41 equations, 2 figures, 2 tables, 2 algorithms.

Key Result

Theorem 1

Let $H_1$ and $H_2$ be two hyperspheres in a Euclidean space $\mathbb{R}^{D}$ with centers $\mathbf{C}_1, \mathbf{C}_2$ and radii $r_1, r_2$, respectively. Let their surfaces intersect in a non-empty set $S_{int}$, defined as: Let $H_3$ be the approximating hypersphere with center $\mathbf{C}_3$ and radius $r_3$ as defined in Definition def:hypersphere_radius. Then:

Figures (2)

  • Figure 1: Example of a top-$1$ explanation for a ProtoPNet with five prototypical parts for two classes.
  • Figure 2: Example (in $\mathbb{R}^2$) of a hypersphere $H_{int}$ containing the intersection between the hypersphere $H_1$ and $H_2$.

Theorems & Definitions (15)

  • Definition 1: Abductive Latent Explanation (ALE)
  • Definition 2: Subset-Minimal ALE
  • Definition 3: ProtoPNet Explanation
  • Definition 4: Hypersphere Intersection Approximation
  • Theorem 1: Containment and Minimality of the Hypersphere Intersection Approximation
  • proof
  • Definition 5: Maximally Class-Favoring Element within $\mathbf{a}_{\mathcal{E}}$
  • Definition 6: Class-wise Prediction Domination within $\mathbf{a}_{\mathcal{E}}$
  • Definition 7: Total Prediction Domination within $\mathbf{a}_{\mathcal{E}}$ (Explanation Verification)
  • Theorem 2: Verified Explanation Sufficiency
  • ...and 5 more