Table of Contents
Fetching ...

Seeking Interpretability and Explainability in Binary Activated Neural Networks

Benjamin Leblanc, Pascal Germain

TL;DR

The paper addresses the tension between predictive performance and interpretability in regression on tabular data by introducing binary activated neural networks (BANNs) and a greedy training approach, the Binary Greedy Network (BGN), which builds compact networks layer by layer and neuron by neuron. It strengthens interpretability with SHAP-based explanations adapted to BANNs, enabling assessment of input features, hidden neurons, and connections. Empirically, BGN yields competitive accuracy while producing shallower, sparser predictors and demonstrates superior interpretability relative to regression trees in selected tasks; pruning baselines struggle to match BGN under parameter constraints. Overall, the work proposes a new family of transparent predictors that balance expressiveness and parsimony, with practical impact for tasks where human-understandable models are essential and for providing explanations through SHAP values tailored to BANNs. Future work suggests extending binary activations to multi-label tasks and exploring binary architectures beyond fully connected layers.

Abstract

We study the use of binary activated neural networks as interpretable and explainable predictors in the context of regression tasks on tabular data; more specifically, we provide guarantees on their expressiveness, present an approach based on the efficient computation of SHAP values for quantifying the relative importance of the features, hidden neurons and even weights. As the model's simplicity is instrumental in achieving interpretability, we propose a greedy algorithm for building compact binary activated networks. This approach doesn't need to fix an architecture for the network in advance: it is built one layer at a time, one neuron at a time, leading to predictors that aren't needlessly complex for a given task.

Seeking Interpretability and Explainability in Binary Activated Neural Networks

TL;DR

The paper addresses the tension between predictive performance and interpretability in regression on tabular data by introducing binary activated neural networks (BANNs) and a greedy training approach, the Binary Greedy Network (BGN), which builds compact networks layer by layer and neuron by neuron. It strengthens interpretability with SHAP-based explanations adapted to BANNs, enabling assessment of input features, hidden neurons, and connections. Empirically, BGN yields competitive accuracy while producing shallower, sparser predictors and demonstrates superior interpretability relative to regression trees in selected tasks; pruning baselines struggle to match BGN under parameter constraints. Overall, the work proposes a new family of transparent predictors that balance expressiveness and parsimony, with practical impact for tasks where human-understandable models are essential and for providing explanations through SHAP values tailored to BANNs. Future work suggests extending binary activations to multi-label tasks and exploring binary architectures beyond fully connected layers.

Abstract

We study the use of binary activated neural networks as interpretable and explainable predictors in the context of regression tasks on tabular data; more specifically, we provide guarantees on their expressiveness, present an approach based on the efficient computation of SHAP values for quantifying the relative importance of the features, hidden neurons and even weights. As the model's simplicity is instrumental in achieving interpretability, we propose a greedy algorithm for building compact binary activated networks. This approach doesn't need to fix an architecture for the network in advance: it is built one layer at a time, one neuron at a time, leading to predictors that aren't needlessly complex for a given task.
Paper Structure (27 sections, 3 theorems, 24 equations, 7 figures, 5 tables, 3 algorithms)

This paper contains 27 sections, 3 theorems, 24 equations, 7 figures, 5 tables, 3 algorithms.

Key Result

theorem thmcountertheorem

Let $B$ be a BANN of depth $l$ and $S$ a dataset of size $m$. We have where ${\mathbf y}_{{\mathbf p}}^{(k)} = \{y\ |\ (\mathbf{x},y)\in S \land (L_k \circ \dots \circ L_1)({\mathbf x}) = {\mathbf p}\} \ \forall k\in\{1,\dots,l\}$.

Figures (7)

  • Figure 1: The behavior of the different layers of a single-hidden layer BANN with architecture $\mathbf{d} = \langle 2,2,1\rangle$ acting in the input space $\mathcal{X}$. The upper figures correspond to their lower counterparts, with a different point of view. Left: The x-axis and y-axis represent the feature space of a 2D problem, the z-axis being the label space. Middle: the hidden layer of the network separates the input space into four regions with the help of $d_1 = 2$ hyperplanes. Right: predictions rendered by the network as a function of its input; the predictions are constant within each region defined by the hidden layer.
  • Figure 2: Building binary activated neural networks with the BGN algorithm, one neuron at a time, one layer at a time. Only weights are displayed, yet biases are tuned as well.
  • Figure 3: Pruning of BNN+ network on the Power Plant dataset over 5 random seeds. The dotted lines show where BGN has converged. Vertical lines depicts one standard deviation.
  • Figure 4: Pruning of QN network on the Istanbul Stock USD dataset. The dotted lines show where BGN has converged. Vertical lines depicts one standard deviation.
  • Figure 5: A visual representation of a regression tree of depth three trained on the housing dataset, created by the scikit-learn library pedregosa2011scikit.
  • ...and 2 more figures

Theorems & Definitions (6)

  • theorem thmcountertheorem
  • Proposition 1
  • Proposition 2
  • proof : \ref{['prop:bin_bnd']}
  • proof : \ref{['prop:BGN_min']}
  • proof : \ref{['prop:BGN_dec']}