Intrinsic User-Centric Interpretability through Global Mixture of Experts
Vinitra Swamy, Syrielle Montariol, Julian Blackwell, Jibril Frej, Martin Jaggi, Tanja Käser
TL;DR
This paper tackles the challenge of making neural models more acceptable in human-centered domains by balancing accuracy with actionably faithful explanations. It introduces InterpretCC, which employs feature gating and a global mixture-of-experts to create sparse, instance-specific explanations that are directly used in prediction. Across text, time-series, and tabular data, InterpretCC achieves competitive performance relative to non-interpretable baselines and outperforms intrinsic interpretable baselines, while a user study with teachers shows explanations that are more actionable and trustworthy. The work demonstrates that human-centric interpretability can be achieved without sacrificing predictive quality, enabling more transparent and actionable AI in education and healthcare.
Abstract
In human-centric settings like education or healthcare, model accuracy and model explainability are key factors for user adoption. Towards these two goals, intrinsically interpretable deep learning models have gained popularity, focusing on accurate predictions alongside faithful explanations. However, there exists a gap in the human-centeredness of these approaches, which often produce nuanced and complex explanations that are not easily actionable for downstream users. We present InterpretCC (interpretable conditional computation), a family of intrinsically interpretable neural networks at a unique point in the design space that optimizes for ease of human understanding and explanation faithfulness, while maintaining comparable performance to state-of-the-art models. InterpretCC achieves this through adaptive sparse activation of features before prediction, allowing the model to use a different, minimal set of features for each instance. We extend this idea into an interpretable, global mixture-of-experts (MoE) model that allows users to specify topics of interest, discretely separates the feature space for each data point into topical subnetworks, and adaptively and sparsely activates these topical subnetworks for prediction. We apply InterpretCC for text, time series and tabular data across several real-world datasets, demonstrating comparable performance with non-interpretable baselines and outperforming intrinsically interpretable baselines. Through a user study involving 56 teachers, InterpretCC explanations are found to have higher actionability and usefulness over other intrinsically interpretable approaches.
