Learning Performance Maximizing Ensembles with Explainability Guarantees
Vincent Pisztora, Jia Li
TL;DR
This work tackles the problem of achieving high predictive performance while preserving intrinsic explainability in high-stakes settings by partitioning observations between a glass-box model $g$ and a black-box model $b$ through an explainability-controlled ensemble (EEG). The authors introduce a ranking-based allocation $a'_q(v)$, governed by $q$ and a robust score $r(z)$ that blends sufficiency indicators with absolute prediction error in a logistic form, and prove several optimality properties (maximal sufficient performance, maximal sufficient explainable performance, conditional maximal absolute performance, monotone allocation). Empirically, EEG demonstrates strong and consistent gains across 31 tabular datasets, achieving a cross-dataset PPCR of $37\%$, average PQEOM of $74\%$, and 95TQM of $94\%$, with many cases where the ensemble matches or exceeds the best component models while maintaining explainability for a large fraction of observations. The results further show that using a comprehensive feature set for the allocator and exploring multiple component-model pairings yields robust allocations, with occasional substantial gains on specific datasets. Overall, EEG offers a principled, model-agnostic approach to balancing performance and explainability, with practical implications for deploying interpretable yet accurate systems in high-stakes domains.
Abstract
In this paper we propose a method for the optimal allocation of observations between an intrinsically explainable glass box model and a black box model. An optimal allocation being defined as one which, for any given explainability level (i.e. the proportion of observations for which the explainable model is the prediction function), maximizes the performance of the ensemble on the underlying task, and maximizes performance of the explainable model on the observations allocated to it, subject to the maximal ensemble performance condition. The proposed method is shown to produce such explainability optimal allocations on a benchmark suite of tabular datasets across a variety of explainable and black box model types. These learned allocations are found to consistently maintain ensemble performance at very high explainability levels (explaining $74\%$ of observations on average), and in some cases even outperforming both the component explainable and black box models while improving explainability.
