Table of Contents
Fetching ...

Learning local discrete features in explainable-by-design convolutional neural networks

Pantelis I. Kaplanoglou, Konstantinos Diamantaras

TL;DR

This proposed framework attempts to break the trade-off between performance and explainability by introducing an explainable-by-design convolutional neural network (CNN) based on the lateral inhibition mechanism and shows promise to exceed this performance while providing an additional stream of explanations.

Abstract

Our proposed framework attempts to break the trade-off between performance and explainability by introducing an explainable-by-design convolutional neural network (CNN) based on the lateral inhibition mechanism. The ExplaiNet model consists of the predictor, that is a high-accuracy CNN with residual or dense skip connections, and the explainer probabilistic graph that expresses the spatial interactions of the network neurons. The value on each graph node is a local discrete feature (LDF) vector, a patch descriptor that represents the indices of antagonistic neurons ordered by the strength of their activations, which are learned with gradient descent. Using LDFs as sequences we can increase the conciseness of explanations by repurposing EXTREME, an EM-based sequence motif discovery method that is typically used in molecular biology. Having a discrete feature motif matrix for each one of intermediate image representations, instead of a continuous activation tensor, allows us to leverage the inherent explainability of Bayesian networks. By collecting observations and directly calculating probabilities, we can explain causal relationships between motifs of adjacent levels and attribute the model's output to global motifs. Moreover, experiments on various tiny image benchmark datasets confirm that our predictor ensures the same level of performance as the baseline architecture for a given count of parameters and/or layers. Our novel method shows promise to exceed this performance while providing an additional stream of explanations. In the solved MNIST classification task, it reaches a comparable to the state-of-the-art performance for single models, using standard training setup and 0.75 million parameters.

Learning local discrete features in explainable-by-design convolutional neural networks

TL;DR

This proposed framework attempts to break the trade-off between performance and explainability by introducing an explainable-by-design convolutional neural network (CNN) based on the lateral inhibition mechanism and shows promise to exceed this performance while providing an additional stream of explanations.

Abstract

Our proposed framework attempts to break the trade-off between performance and explainability by introducing an explainable-by-design convolutional neural network (CNN) based on the lateral inhibition mechanism. The ExplaiNet model consists of the predictor, that is a high-accuracy CNN with residual or dense skip connections, and the explainer probabilistic graph that expresses the spatial interactions of the network neurons. The value on each graph node is a local discrete feature (LDF) vector, a patch descriptor that represents the indices of antagonistic neurons ordered by the strength of their activations, which are learned with gradient descent. Using LDFs as sequences we can increase the conciseness of explanations by repurposing EXTREME, an EM-based sequence motif discovery method that is typically used in molecular biology. Having a discrete feature motif matrix for each one of intermediate image representations, instead of a continuous activation tensor, allows us to leverage the inherent explainability of Bayesian networks. By collecting observations and directly calculating probabilities, we can explain causal relationships between motifs of adjacent levels and attribute the model's output to global motifs. Moreover, experiments on various tiny image benchmark datasets confirm that our predictor ensures the same level of performance as the baseline architecture for a given count of parameters and/or layers. Our novel method shows promise to exceed this performance while providing an additional stream of explanations. In the solved MNIST classification task, it reaches a comparable to the state-of-the-art performance for single models, using standard training setup and 0.75 million parameters.

Paper Structure

This paper contains 57 sections, 23 equations, 13 figures, 15 tables.

Figures (13)

  • Figure 1: Overview of the ExplaiNet model. A black-box feed forward (orange arrows) neural network predictor, that offers high prediction accuracy. Streams of discrete features (green arrows) provide values to nodes of the probabilistic explainer graph that uses them to explain predictions and intermediate features. The nodes are mapped to spatial positions of the input at each level.
  • Figure 2: Lateral Inhibition Layer (LIL) placement inside a residual module.
  • Figure 3: Steps of the ExplaiNet framework process for the generation of explanations.
  • Figure 4: Discrete feature mosaics. Left:1 level mapped to a $7\times7$ area of the input image, right:2 level to $11\times11$. Middle: Observed causes $C^{(1)}_{x,y}$ in the receptive field of effects $m^{(2)}_{68}$ and $m^{(2)}_{13}$.
  • Figure 5: Matching scores superimposed on the input image for FMotif 19 of level 8 in R-ExplaiNet18-16, along with its logo. At this explanation level we have global feature motifs in a $4\times4$ matrix.
  • ...and 8 more figures