Table of Contents
Fetching ...

Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead

Cynthia Rudin

TL;DR

The paper argues against relying on post hoc explanations for black-box models in high-stakes decisions and makes a case for inherently interpretable models. It surveys fundamental issues with explainable ML, including faithfulness, misinterpretation, and integration with external information, and discusses governance and policy implications to incentivize interpretable solutions. It outlines concrete algorithmic approaches (e.g., CORELS, RiskSLIM) and domain-specific interpretable strategies (such as prototype-based networks in vision), underpinned by the Rashomon set concept to explain why accurate interpretable models can exist. The work advocates shifting from explainability to interpretability as a practical, policy-informed path to safer, more trustworthy decision-making systems.

Abstract

Black box machine learning models are currently being used for high stakes decision-making throughout society, causing problems throughout healthcare, criminal justice, and in other domains. People have hoped that creating methods for explaining these black box models will alleviate some of these problems, but trying to \textit{explain} black box models, rather than creating models that are \textit{interpretable} in the first place, is likely to perpetuate bad practices and can potentially cause catastrophic harm to society. There is a way forward -- it is to design models that are inherently interpretable. This manuscript clarifies the chasm between explaining black boxes and using inherently interpretable models, outlines several key reasons why explainable black boxes should be avoided in high-stakes decisions, identifies challenges to interpretable machine learning, and provides several example applications where interpretable models could potentially replace black box models in criminal justice, healthcare, and computer vision.

Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead

TL;DR

The paper argues against relying on post hoc explanations for black-box models in high-stakes decisions and makes a case for inherently interpretable models. It surveys fundamental issues with explainable ML, including faithfulness, misinterpretation, and integration with external information, and discusses governance and policy implications to incentivize interpretable solutions. It outlines concrete algorithmic approaches (e.g., CORELS, RiskSLIM) and domain-specific interpretable strategies (such as prototype-based networks in vision), underpinned by the Rashomon set concept to explain why accurate interpretable models can exist. The work advocates shifting from explainability to interpretability as a practical, policy-informed path to safer, more trustworthy decision-making systems.

Abstract

Black box machine learning models are currently being used for high stakes decision-making throughout society, causing problems throughout healthcare, criminal justice, and in other domains. People have hoped that creating methods for explaining these black box models will alleviate some of these problems, but trying to \textit{explain} black box models, rather than creating models that are \textit{interpretable} in the first place, is likely to perpetuate bad practices and can potentially cause catastrophic harm to society. There is a way forward -- it is to design models that are inherently interpretable. This manuscript clarifies the chasm between explaining black boxes and using inherently interpretable models, outlines several key reasons why explainable black boxes should be avoided in high-stakes decisions, identifies challenges to interpretable machine learning, and provides several example applications where interpretable models could potentially replace black box models in criminal justice, healthcare, and computer vision.

Paper Structure

This paper contains 11 sections, 3 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: A fictional depiction of the "accuracy-interpretability trade-off," taken from the DARPA XAI (Explainable Artificial Intelligence) Broad Agency Announcement XAIBAA.
  • Figure 2: Saliency does not explain anything except where the network is looking. We have no idea why this image is labeled as either a dog or a musical instrument when considering only saliency. The explanations look essentially the same for both classes. Figure credit: Chaofan Chen and Checkermallow.
  • Figure 3: This is a machine learning model from the Certifiably Optimal Rule Lists (CORELS) algorithm angelino2018. This model is the minimizer of a special case of Equation \ref{['eq:optim']} discussed later in the challenges section. CORELS' code is open source and publicly available at http://corels.eecs.harvard.edu/, along with the data from Florida needed to produce this model.
  • Figure 4: Scoring system for risk of recidivism from RudinUs18UstunRu2017KDDZengUsRu2017ustun2015slim. This model was not created by a human; the selection of numbers and features come from the RiskSLIM machine learning algorithm.
  • Figure 5: Image from the authors of ChenEtAl18, indicating that parts of the test image on the left are similar to prototypical parts of training examples. The test image to be classified is on the left, the most similar prototypes are in the middle column, and the heatmaps that show which part of the test image is similar to the prototype are on the right. We included copies of the test image on the right so that it is easier to see what part of the bird the heatmaps are referring to. The similarities of the prototypes to the test image are what determine the predicted class label of the image. Here, the image is predicted to be a clay-colored sparrow. The top prototype seems to be comparing the bird's head to a prototypical head of a clay-colored sparrow, the second prototype considers the throat of the bird, the third looks at feathers, and the last seems to consider the abdomen and leg. Test image from Omalley. Prototypes from ksblack99Schmierer17Schmierer15Schmierer15a. Image constructed by Alina Barnett.