Intrinsic User-Centric Interpretability through Global Mixture of Experts

Vinitra Swamy; Syrielle Montariol; Julian Blackwell; Jibril Frej; Martin Jaggi; Tanja Käser

Intrinsic User-Centric Interpretability through Global Mixture of Experts

Vinitra Swamy, Syrielle Montariol, Julian Blackwell, Jibril Frej, Martin Jaggi, Tanja Käser

TL;DR

This paper tackles the challenge of making neural models more acceptable in human-centered domains by balancing accuracy with actionably faithful explanations. It introduces InterpretCC, which employs feature gating and a global mixture-of-experts to create sparse, instance-specific explanations that are directly used in prediction. Across text, time-series, and tabular data, InterpretCC achieves competitive performance relative to non-interpretable baselines and outperforms intrinsic interpretable baselines, while a user study with teachers shows explanations that are more actionable and trustworthy. The work demonstrates that human-centric interpretability can be achieved without sacrificing predictive quality, enabling more transparent and actionable AI in education and healthcare.

Abstract

In human-centric settings like education or healthcare, model accuracy and model explainability are key factors for user adoption. Towards these two goals, intrinsically interpretable deep learning models have gained popularity, focusing on accurate predictions alongside faithful explanations. However, there exists a gap in the human-centeredness of these approaches, which often produce nuanced and complex explanations that are not easily actionable for downstream users. We present InterpretCC (interpretable conditional computation), a family of intrinsically interpretable neural networks at a unique point in the design space that optimizes for ease of human understanding and explanation faithfulness, while maintaining comparable performance to state-of-the-art models. InterpretCC achieves this through adaptive sparse activation of features before prediction, allowing the model to use a different, minimal set of features for each instance. We extend this idea into an interpretable, global mixture-of-experts (MoE) model that allows users to specify topics of interest, discretely separates the feature space for each data point into topical subnetworks, and adaptively and sparsely activates these topical subnetworks for prediction. We apply InterpretCC for text, time series and tabular data across several real-world datasets, demonstrating comparable performance with non-interpretable baselines and outperforming intrinsically interpretable baselines. Through a user study involving 56 teachers, InterpretCC explanations are found to have higher actionability and usefulness over other intrinsically interpretable approaches.

Intrinsic User-Centric Interpretability through Global Mixture of Experts

TL;DR

Abstract

Paper Structure (40 sections, 5 equations, 20 figures, 11 tables)

This paper contains 40 sections, 5 equations, 20 figures, 11 tables.

Introduction
Background
Methodology
Feature Gating
Group Routing
Experimental Settings
Experimental Results
Exp. 1: InterpretCC does not compromise on performance
Exp. 2: InterpretCC provides faithful and user-friendly explanations
Exp. 3: InterpretCC explanations are preferred by humans
Discussion and Conclusion
Acknowledgements
Taxonomy of Explanation Design Criteria
Additional Details on Datasets
InterpretCC Group Routing Schema
...and 25 more sections

Figures (20)

Figure 1: InterpretCC Architectures: Feature Gating (left, individual features): (i) All features are input into a discriminator network that outputs a sparse feature activation mask; (ii) Only the features selected via the mask are passed to a predictive network for the final prediction. Group Routing (right, pre-defined feature groups): (i) Features are statically assigned to distinct groups, with each feature routed to only one group; (ii) Features are input to a discriminator network, generating a sparse group activation mask; (iii) Predictions from activated sub-networks (selected via mask) are aggregated by a weighted sum to produce the final output.
Figure 2: InterpretCC Feature Gating Sparsity: % of features activated per data point across five representative datasets.
Figure 3: InterpretCC Group Routing Performance: balanced accuracy (average $\pm$ std) on routing strategies (paper, pattern, GPT-4) for the EDU datasets in comparison to the non-interpretable baseline.
Figure 3: AG News and SST: # of ICC subnetwork activations (left) vs. avg. activation weights (right), grouped by subnetworks based on the Dewey Decimal Code.
Figure 4: Model score for each user study criterion (average $\pm$ std) and criteria weight according to users' ranking. All scores range from 1 (lowest) to 5 (highest).
...and 15 more figures

Intrinsic User-Centric Interpretability through Global Mixture of Experts

TL;DR

Abstract

Intrinsic User-Centric Interpretability through Global Mixture of Experts

Authors

TL;DR

Abstract

Table of Contents

Figures (20)