Seeking Interpretability and Explainability in Binary Activated Neural Networks
Benjamin Leblanc, Pascal Germain
TL;DR
The paper addresses the tension between predictive performance and interpretability in regression on tabular data by introducing binary activated neural networks (BANNs) and a greedy training approach, the Binary Greedy Network (BGN), which builds compact networks layer by layer and neuron by neuron. It strengthens interpretability with SHAP-based explanations adapted to BANNs, enabling assessment of input features, hidden neurons, and connections. Empirically, BGN yields competitive accuracy while producing shallower, sparser predictors and demonstrates superior interpretability relative to regression trees in selected tasks; pruning baselines struggle to match BGN under parameter constraints. Overall, the work proposes a new family of transparent predictors that balance expressiveness and parsimony, with practical impact for tasks where human-understandable models are essential and for providing explanations through SHAP values tailored to BANNs. Future work suggests extending binary activations to multi-label tasks and exploring binary architectures beyond fully connected layers.
Abstract
We study the use of binary activated neural networks as interpretable and explainable predictors in the context of regression tasks on tabular data; more specifically, we provide guarantees on their expressiveness, present an approach based on the efficient computation of SHAP values for quantifying the relative importance of the features, hidden neurons and even weights. As the model's simplicity is instrumental in achieving interpretability, we propose a greedy algorithm for building compact binary activated networks. This approach doesn't need to fix an architecture for the network in advance: it is built one layer at a time, one neuron at a time, leading to predictors that aren't needlessly complex for a given task.
