Table of Contents
Fetching ...

Obtaining Example-Based Explanations from Deep Neural Networks

Genghua Dong, Henrik Boström, Michalis Vazirgiannis, Roman Bresson

TL;DR

This work addresses the explainability of deep neural networks by introducing EBE-DNN, a method that combines the representational power of DNN embeddings with a k-nearest neighbors search to produce example-based explanations. By extracting embeddings from a chosen layer and retrieving the closest training examples, EBE-DNN provides concise, example-attribution-based explanations while preserving or enhancing predictive accuracy. The empirical study across MNIST, Fashion-MNIST, and CIFAR-10 shows that middle layers often yield the best balance between interpretability and performance, with deeper layers offering more discriminative, category-level attributions. The approach demonstrates that test predictions can be supported by a small, interpretable set of training examples, facilitating trust and transparency in deep learning systems and suggesting avenues for automated layer selection and scalability in future work.

Abstract

Most techniques for explainable machine learning focus on feature attribution, i.e., values are assigned to the features such that their sum equals the prediction. Example attribution is another form of explanation that assigns weights to the training examples, such that their scalar product with the labels equals the prediction. The latter may provide valuable complementary information to feature attribution, in particular in cases where the features are not easily interpretable. Current example-based explanation techniques have targeted a few model types only, such as k-nearest neighbors and random forests. In this work, a technique for obtaining example-based explanations from deep neural networks (EBE-DNN) is proposed. The basic idea is to use the deep neural network to obtain an embedding, which is employed by a k-nearest neighbor classifier to form a prediction; the example attribution can hence straightforwardly be derived from the latter. Results from an empirical investigation show that EBE-DNN can provide highly concentrated example attributions, i.e., the predictions can be explained with few training examples, without reducing accuracy compared to the original deep neural network. Another important finding from the empirical investigation is that the choice of layer to use for the embeddings may have a large impact on the resulting accuracy.

Obtaining Example-Based Explanations from Deep Neural Networks

TL;DR

This work addresses the explainability of deep neural networks by introducing EBE-DNN, a method that combines the representational power of DNN embeddings with a k-nearest neighbors search to produce example-based explanations. By extracting embeddings from a chosen layer and retrieving the closest training examples, EBE-DNN provides concise, example-attribution-based explanations while preserving or enhancing predictive accuracy. The empirical study across MNIST, Fashion-MNIST, and CIFAR-10 shows that middle layers often yield the best balance between interpretability and performance, with deeper layers offering more discriminative, category-level attributions. The approach demonstrates that test predictions can be supported by a small, interpretable set of training examples, facilitating trust and transparency in deep learning systems and suggesting avenues for automated layer selection and scalability in future work.

Abstract

Most techniques for explainable machine learning focus on feature attribution, i.e., values are assigned to the features such that their sum equals the prediction. Example attribution is another form of explanation that assigns weights to the training examples, such that their scalar product with the labels equals the prediction. The latter may provide valuable complementary information to feature attribution, in particular in cases where the features are not easily interpretable. Current example-based explanation techniques have targeted a few model types only, such as k-nearest neighbors and random forests. In this work, a technique for obtaining example-based explanations from deep neural networks (EBE-DNN) is proposed. The basic idea is to use the deep neural network to obtain an embedding, which is employed by a k-nearest neighbor classifier to form a prediction; the example attribution can hence straightforwardly be derived from the latter. Results from an empirical investigation show that EBE-DNN can provide highly concentrated example attributions, i.e., the predictions can be explained with few training examples, without reducing accuracy compared to the original deep neural network. Another important finding from the empirical investigation is that the choice of layer to use for the embeddings may have a large impact on the resulting accuracy.

Paper Structure

This paper contains 13 sections, 6 figures, 1 algorithm.

Figures (6)

  • Figure 1: The structure of EBE-DNN
  • Figure 2: Example attributions of cat and automobile test images
  • Figure 3: Example attributions of an automobile test image from CIFAR-10
  • Figure 4: Accuracy on MNIST for different layers and number of examples
  • Figure 5: Accuracy on Fashion-MNIST for different layers and number of examples
  • ...and 1 more figures