Table of Contents
Fetching ...

An Overview of Prototype Formulations for Interpretable Deep Learning

Maximilian Xiling Li, Korbinian Franz Rudolf, Paul Mattes, Nils Blank, Rudolf Lioutikov

TL;DR

The paper addresses the interpretability gap in deep vision models by systematically evaluating prototype-based approaches. It introduces HyperPG, a probabilistic prototype representation on the hypersphere that models a Gaussian over cosine similarities, and benchmarks it against Euclidean and cosine prototypes across multiple datasets. Results show hyperspherical, probabilistic formulations offer competitive or superior performance with markedly reduced sensitivity to training hyperparameters, especially under simplified optimization regimes, and provide robust interpretability through prototype activations. The findings suggest that hyperspherical and probabilistic prototypes enhance practical deployment of interpretable deep learning, with avenues for extending to mixture models and Bayesian variants.

Abstract

Prototypical part networks offer interpretable alternatives to black-box deep learning models by learning visual prototypes for classification. This work provides a comprehensive analysis of prototype formulations, comparing point-based and probabilistic approaches in both Euclidean and hyperspherical latent spaces. We introduce HyperPG, a probabilistic prototype representation using Gaussian distributions on hyperspheres. Experiments on CUB-200-2011, Stanford Cars, and Oxford Flowers datasets show that hyperspherical prototypes outperform standard Euclidean formulations. Critically, hyperspherical prototypes maintain competitive performance under simplified training schemes, while Euclidean prototypes require extensive hyperparameter tuning.

An Overview of Prototype Formulations for Interpretable Deep Learning

TL;DR

The paper addresses the interpretability gap in deep vision models by systematically evaluating prototype-based approaches. It introduces HyperPG, a probabilistic prototype representation on the hypersphere that models a Gaussian over cosine similarities, and benchmarks it against Euclidean and cosine prototypes across multiple datasets. Results show hyperspherical, probabilistic formulations offer competitive or superior performance with markedly reduced sensitivity to training hyperparameters, especially under simplified optimization regimes, and provide robust interpretability through prototype activations. The findings suggest that hyperspherical and probabilistic prototypes enhance practical deployment of interpretable deep learning, with avenues for extending to mixture models and Bayesian variants.

Abstract

Prototypical part networks offer interpretable alternatives to black-box deep learning models by learning visual prototypes for classification. This work provides a comprehensive analysis of prototype formulations, comparing point-based and probabilistic approaches in both Euclidean and hyperspherical latent spaces. We introduce HyperPG, a probabilistic prototype representation using Gaussian distributions on hyperspheres. Experiments on CUB-200-2011, Stanford Cars, and Oxford Flowers datasets show that hyperspherical prototypes outperform standard Euclidean formulations. Critically, hyperspherical prototypes maintain competitive performance under simplified training schemes, while Euclidean prototypes require extensive hyperparameter tuning.

Paper Structure

This paper contains 43 sections, 17 equations, 23 figures, 3 tables.

Figures (23)

  • Figure 1: Different Prototype Formulations. HyperPG is a novel formulation for probabilistic prototypes on a Hypersphere.
  • Figure 2: Illustration of similarity computation between prototype ${\bm{p}}$ and latent vector ${\bm{z}}$ for different formulations. Euclidean: $L_2$ distance in latent space. Hyperspherical: cosine similarity of normalized vectors (angle on hypersphere). Gaussian: PDF of Gaussian distribution in Euclidean space. HyperPG: PDF of Gaussian distribution over cosine similarities (Gaussian on hypersphere surface).
  • Figure 3: HyperPG prototypes learn a distribution of cosine similarities. They use a learnable anchor vector $\boldsymbol{{\alpha}}$, scalar mean $\mu$ and variance $\sigma^2$. They project a Gaussian distribution of cosine similarities on the surface of a hypersphere, resulting in ring shaped activation patterns around the anchor vector.
  • Figure 4: Prototype Learning Architecture. The HyperPG module can be easily exchanged to other prototype formulations such as Euclidian or Cosine prototypes. HyperPG uses a Gaussian distribution as Density Estimator, but other PDFs are possible.
  • Figure 5: CUB-200-2011 Test Accuracy per Epoch with ResNet50 backbone and simplified optimization scheme. Mean and std over 3 random seeds.
  • ...and 18 more figures