Integrating White and Black Box Techniques for Interpretable Machine Learning

Eric M. Vernon; Naoki Masuyama; Yusuke Nojima

Integrating White and Black Box Techniques for Interpretable Machine Learning

Eric M. Vernon, Naoki Masuyama, Yusuke Nojima

TL;DR

The paper tackles the interpretability-accuracy trade-off in machine learning by introducing a three-component ensemble: a white-box base classifier for easy inputs, a black-box deferral classifier for hard inputs, and a white-box grader that routes inputs to either model. Training relabels data into 'easy' and 'hard' based on the base’s performance and uses SMOTE to balance the resulting dataset before training the grader; new inputs are then routed accordingly. Empirical results on multiple OpenML datasets demonstrate that the approach can maintain high final accuracy while providing interpretable reasoning for easy cases and transparent justification for when a more complex model is required. This method offers a practical pathway to deploy high-performing yet interpretable systems in real-world settings, with potential extensions to other white-box/gradient-boosted configurations and user-facing visualization tools.

Abstract

In machine learning algorithm design, there exists a trade-off between the interpretability and performance of the algorithm. In general, algorithms which are simpler and easier for humans to comprehend tend to show worse performance than more complex, less transparent algorithms. For example, a random forest classifier is likely to be more accurate than a simple decision tree, but at the expense of interpretability. In this paper, we present an ensemble classifier design which classifies easier inputs using a highly-interpretable classifier (i.e., white box model), and more difficult inputs using a more powerful, but less interpretable classifier (i.e., black box model).

Integrating White and Black Box Techniques for Interpretable Machine Learning

TL;DR

Abstract

Paper Structure (14 sections, 4 figures, 1 table)

This paper contains 14 sections, 4 figures, 1 table.

Introduction
Research Background
Interpretability in Machine Learning
Classification with a Reject Option
Proposed Method
Overview
Training
Evaluating New Inputs
Data Resampling
2-D Example
Computational Experiments
Experiment Design
Experimental Results
Conclusion and Future Work

Figures (4)

Figure 1: The two-step process of evaluating new inputs.
Figure 2: Example decision boundaries when using a decision tree (left) and a random forest (right) classifier. Points which fall within the shaded region are considered "hard", and are labeled with the random forest. The remainder are considered "easy" and are evaluated using the decision tree. The shaded region itself is the output of another decision tree.
Figure 3: Decision trees for the base classifier (left) and the grader (right). Patterns which the grader considers "easy" are evaluated using the base classifier, while "hard" patterns are evaluated using a more complex deferral classifier.
Figure 4: A textual representation base classifier (left) and grader (right) decision trees for the "Gas Sensor Array Drift" dataset. While the dataset has 128 features, the majority of patterns can be correctly classified using only a small subset of features. Moreover, the set of patterns which are difficult to classify is just as easily described.

Integrating White and Black Box Techniques for Interpretable Machine Learning

TL;DR

Abstract

Integrating White and Black Box Techniques for Interpretable Machine Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)