Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead
Cynthia Rudin
TL;DR
The paper argues against relying on post hoc explanations for black-box models in high-stakes decisions and makes a case for inherently interpretable models. It surveys fundamental issues with explainable ML, including faithfulness, misinterpretation, and integration with external information, and discusses governance and policy implications to incentivize interpretable solutions. It outlines concrete algorithmic approaches (e.g., CORELS, RiskSLIM) and domain-specific interpretable strategies (such as prototype-based networks in vision), underpinned by the Rashomon set concept to explain why accurate interpretable models can exist. The work advocates shifting from explainability to interpretability as a practical, policy-informed path to safer, more trustworthy decision-making systems.
Abstract
Black box machine learning models are currently being used for high stakes decision-making throughout society, causing problems throughout healthcare, criminal justice, and in other domains. People have hoped that creating methods for explaining these black box models will alleviate some of these problems, but trying to \textit{explain} black box models, rather than creating models that are \textit{interpretable} in the first place, is likely to perpetuate bad practices and can potentially cause catastrophic harm to society. There is a way forward -- it is to design models that are inherently interpretable. This manuscript clarifies the chasm between explaining black boxes and using inherently interpretable models, outlines several key reasons why explainable black boxes should be avoided in high-stakes decisions, identifies challenges to interpretable machine learning, and provides several example applications where interpretable models could potentially replace black box models in criminal justice, healthcare, and computer vision.
