Table of Contents
Fetching ...

Are machine learning interpretations reliable? A stability study on global interpretations

Luqin Gan, Tarek M. Zikry, Genevera I. Allen

TL;DR

This paper investigates the reliability of machine learning interpretations by empirically assessing the stability of global IML methods across supervised and unsupervised tasks on tabular data. It defines stability as a practical proxy for reliability and develops a large-scale framework that perturbs data through random splits and noise to measure within- and between-method interpretation consistency, alongside predictive accuracy. The study finds that interpretations are often unstable, with no single method consistently outperforming others across datasets, and importantly shows that higher prediction accuracy does not guarantee more stable explanations. To enable reproducibility and practical use, the authors release an open-source dashboard and Python package that let researchers evaluate interpretation stability on their own data, advocating reporting stability alongside accuracy in high-stakes decisions.

Abstract

As machine learning systems are increasingly used in high-stakes domains, there is a growing emphasis placed on making them interpretable to improve trust in these systems. In response, a range of interpretable machine learning (IML) methods have been developed to generate human-understandable insights into otherwise black box models. With these methods, a fundamental question arises: Are these interpretations reliable? Unlike with prediction accuracy or other evaluation metrics for supervised models, the proximity to the true interpretation is difficult to define. Instead, we ask a closely related question that we argue is a prerequisite for reliability: Are these interpretations stable? We define stability as findings that are consistent or reliable under small random perturbations to the data or algorithms. In this study, we conduct the first systematic, large-scale empirical stability study on popular machine learning global interpretations for both supervised and unsupervised tasks on tabular data. Our findings reveal that popular interpretation methods are frequently unstable, notably less stable than the predictions themselves, and that there is no association between the accuracy of machine learning predictions and the stability of their associated interpretations. Moreover, we show that no single method consistently provides the most stable interpretations across a range of benchmark datasets. Overall, these results suggest that interpretability alone does not warrant trust, and underscores the need for rigorous evaluation of interpretation stability in future work. To support these principles, we have developed and released an open source IML dashboard and Python package to enable researchers to assess the stability and reliability of their own data-driven interpretations and discoveries.

Are machine learning interpretations reliable? A stability study on global interpretations

TL;DR

This paper investigates the reliability of machine learning interpretations by empirically assessing the stability of global IML methods across supervised and unsupervised tasks on tabular data. It defines stability as a practical proxy for reliability and develops a large-scale framework that perturbs data through random splits and noise to measure within- and between-method interpretation consistency, alongside predictive accuracy. The study finds that interpretations are often unstable, with no single method consistently outperforming others across datasets, and importantly shows that higher prediction accuracy does not guarantee more stable explanations. To enable reproducibility and practical use, the authors release an open-source dashboard and Python package that let researchers evaluate interpretation stability on their own data, advocating reporting stability alongside accuracy in high-stakes decisions.

Abstract

As machine learning systems are increasingly used in high-stakes domains, there is a growing emphasis placed on making them interpretable to improve trust in these systems. In response, a range of interpretable machine learning (IML) methods have been developed to generate human-understandable insights into otherwise black box models. With these methods, a fundamental question arises: Are these interpretations reliable? Unlike with prediction accuracy or other evaluation metrics for supervised models, the proximity to the true interpretation is difficult to define. Instead, we ask a closely related question that we argue is a prerequisite for reliability: Are these interpretations stable? We define stability as findings that are consistent or reliable under small random perturbations to the data or algorithms. In this study, we conduct the first systematic, large-scale empirical stability study on popular machine learning global interpretations for both supervised and unsupervised tasks on tabular data. Our findings reveal that popular interpretation methods are frequently unstable, notably less stable than the predictions themselves, and that there is no association between the accuracy of machine learning predictions and the stability of their associated interpretations. Moreover, we show that no single method consistently provides the most stable interpretations across a range of benchmark datasets. Overall, these results suggest that interpretability alone does not warrant trust, and underscores the need for rigorous evaluation of interpretation stability in future work. To support these principles, we have developed and released an open source IML dashboard and Python package to enable researchers to assess the stability and reliability of their own data-driven interpretations and discoveries.

Paper Structure

This paper contains 71 sections, 9 equations, 14 figures, 5 tables.

Figures (14)

  • Figure 1: Overview of study design with data splitting. After conducting multiple random data splitting, interpretations, and test predictions are generated using different IML methods on each training set. Then the within-method stability and between-method stability, and average prediction accuracy on the test set are computed.
  • Figure 2: IML Performance on Classification Tasks.A: Heatmap of within-method interpretation stability. B: Bump plot of IML methods ranked by the level of interpretation stability. C: Heatmap of between-method interpretation stability. D: Heatmap of between-method prediction accuracy on test sets. E: Heatmap of prediction stability on test sets. F: Scatterplot of accuracy and interpretation stability, colored by data sets, with fitted OLS lines aggregated over data. G: Scatterplot of accuracy and interpretation stability, colored by data sets, with fitted OLS lines aggregated over IML methods.
  • Figure 3: IML Performance on Regression Tasks.A: Heatmap of within-method interpretation stability. B: Bump plot of IML methods ranked by the level of interpretation stability. C: Heatmap of between-method interpretation stability. D: Heatmap of between-method prediction accuracy on test sets. E: Heatmap of prediction stability on test sets. F: Scatterplot of accuracy and interpretation stability, colored by data sets, with fitted OLS lines aggregated over data. G: Scatterplot of accuracy and interpretation stability, colored by data sets, with fitted OLS lines aggregated over IML methods.
  • Figure 4: IML Performance on Clustering Methods. A: Heatmap of within-method interpretation stability. B: Bump plot of IML methods ranked by the level of interpretation stability. C: Heatmap of between-method interpretation stability. D: Heatmap of between-method prediction accuracy.
  • Figure 5: IML Performance on Dimension Reduction Methods. A: Heatmap of within-method interpretation stability. B: Bump plot of IML methods ranked by the level of interpretation stability. C: Heatmap of between-method interpretation stability. D: Heatmap of between-method prediction accuracy.
  • ...and 9 more figures