Distilling a Neural Network Into a Soft Decision Tree
Nicholas Frosst, Geoffrey Hinton
TL;DR
This paper tackles the challenge of interpreting deep neural networks by distilling their input–output behavior into a soft decision tree, enabling hierarchical decisions that yield explanations alongside predictions. The authors introduce the Hierarchical Mixture of Bigots, train it with gradient-based optimization and regularizers to preserve useful structure, and demonstrate that distillation can improve generalization relative to trees trained directly while sacrificing some accuracy compared to the original neural network. They validate the approach on MNIST and additional datasets, showing improved explainability through path-based reasoning and interpretable filters. The work highlights a practical path to explainable AI: leverage a powerful neural net to train an explicitly interpretable model that retains much of the predictive power with much faster, tractable inference at test time.
Abstract
Deep neural networks have proved to be a very effective way to perform classification tasks. They excel when the input data is high dimensional, the relationship between the input and the output is complicated, and the number of labeled training examples is large. But it is hard to explain why a learned network makes a particular classification decision on a particular test case. This is due to their reliance on distributed hierarchical representations. If we could take the knowledge acquired by the neural net and express the same knowledge in a model that relies on hierarchical decisions instead, explaining a particular decision would be much easier. We describe a way of using a trained neural net to create a type of soft decision tree that generalizes better than one learned directly from the training data.
