Explainable Neural Networks with Guarantees: A Sparse Estimation Approach
Antoine Ledent, Peng Liu
TL;DR
SparXnet tackles the tension between predictive power and interpretability in neural networks by introducing a sparse, explainable architecture that automatically selects a small set of input features through a softmax-based routing mechanism and learns K one-dimensional Lipschitz transformation functions. The final prediction is a linear combination of these transformed features, providing direct interpretability via feature importances and per-feature effects. The paper proves a generalization bound where the sample complexity scales with the number of selected features $K$ and the Lipschitz constants, with only a logarithmic dependence on the total number of input features $d$ and independence from the number of parameters. Empirical results on synthetic and real datasets show SparXnet achieves competitive or superior performance while maintaining much more interpretable, sparse models, demonstrating practical potential for high-stakes domains like healthcare and finance.
Abstract
Balancing predictive power and interpretability has long been a challenging research area, particularly in powerful yet complex models like neural networks, where nonlinearity obstructs direct interpretation. This paper introduces a novel approach to constructing an explainable neural network that harmonizes predictiveness and explainability. Our model, termed SparXnet, is designed as a linear combination of a sparse set of jointly learned features, each derived from a different trainable function applied to a single 1-dimensional input feature. Leveraging the ability to learn arbitrarily complex relationships, our neural network architecture enables automatic selection of a sparse set of important features, with the final prediction being a linear combination of rescaled versions of these features. We demonstrate the ability to select significant features while maintaining comparable predictive performance and direct interpretability through extensive experiments on synthetic and real-world datasets. We also provide theoretical analysis on the generalization bounds of our framework, which is favorably linear in the number of selected features and only logarithmic in the number of input features. We further lift any dependence of sample complexity on the number of parameters or the architectural details under very mild conditions. Our research paves the way for further research on sparse and explainable neural networks with guarantee.
