Table of Contents
Fetching ...

Improving Explainability of Softmax Classifiers Using a Prototype-Based Joint Embedding Method

Hilarie Sit, Brendan Keith, Karianne Bergen

TL;DR

The paper addresses the explainability gap in softmax classifiers by introducing a Prototype-Based Joint Embedding (PB&J) method that bases predictions on latent-space distances to training exemplars. PB&J replaces or augments the final layer with a distance-based scoring mechanism: for an anchor, $d_i = ||ell_A - ell_i||_2$ and $m_i = \log((d_i^2+1)/(d_i^2+1e-10))$, then computes $\bar{s} = \bar{m} W^T$ to obtain class probabilities via softmax, enabling instance-based explanations. Key contributions include a tunable prototype framework that supports stochastic prototype sampling for explanations and a centroid-based variant for efficient OOD detection, with experiments showing competitive accuracy on MNIST, FashionMNIST, CIFAR10, and CUB-200-2001 and improved OOD signaling over standard networks. The approach offers practical benefits for scientific domains requiring transparent predictions and reliable uncertainty estimates, with future work exploring parts-based prototypes and broader datasets.

Abstract

We propose a prototype-based approach for improving explainability of softmax classifiers that provides an understandable prediction confidence, generated through stochastic sampling of prototypes, and demonstrates potential for out of distribution detection (OOD). By modifying the model architecture and training to make predictions using similarities to any set of class examples from the training dataset, we acquire the ability to sample for prototypical examples that contributed to the prediction, which provide an instance-based explanation for the model's decision. Furthermore, by learning relationships between images from the training dataset through relative distances within the model's latent space, we obtain a metric for uncertainty that is better able to detect out of distribution data than softmax confidence.

Improving Explainability of Softmax Classifiers Using a Prototype-Based Joint Embedding Method

TL;DR

The paper addresses the explainability gap in softmax classifiers by introducing a Prototype-Based Joint Embedding (PB&J) method that bases predictions on latent-space distances to training exemplars. PB&J replaces or augments the final layer with a distance-based scoring mechanism: for an anchor, and , then computes to obtain class probabilities via softmax, enabling instance-based explanations. Key contributions include a tunable prototype framework that supports stochastic prototype sampling for explanations and a centroid-based variant for efficient OOD detection, with experiments showing competitive accuracy on MNIST, FashionMNIST, CIFAR10, and CUB-200-2001 and improved OOD signaling over standard networks. The approach offers practical benefits for scientific domains requiring transparent predictions and reliable uncertainty estimates, with future work exploring parts-based prototypes and broader datasets.

Abstract

We propose a prototype-based approach for improving explainability of softmax classifiers that provides an understandable prediction confidence, generated through stochastic sampling of prototypes, and demonstrates potential for out of distribution detection (OOD). By modifying the model architecture and training to make predictions using similarities to any set of class examples from the training dataset, we acquire the ability to sample for prototypical examples that contributed to the prediction, which provide an instance-based explanation for the model's decision. Furthermore, by learning relationships between images from the training dataset through relative distances within the model's latent space, we obtain a metric for uncertainty that is better able to detect out of distribution data than softmax confidence.
Paper Structure (15 sections, 2 equations, 8 figures, 4 tables)

This paper contains 15 sections, 2 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Model Architecture.
  • Figure 2: Analysis for a challenging image of a coat from FashionMNIST. Model is undecided between coat and shirt, and we show the closest prototypes that resulted in each type of prediction.
  • Figure 3: Analysis for a straightforward image of an ankle boot from FashionMNIST. Model predicts the correct class 100% of the time, and the closest prototypes are visually similar to the test image.
  • Figure 4: Out of distribution detection performance on Two Moons dataset. Colorbar represents confidence of prediction.
  • Figure 5: Posterior distribution and prototypes for first five misclassified test images from FashionMNIST.
  • ...and 3 more figures