Table of Contents
Fetching ...

Evaluating Explainability in Machine Learning Predictions through Explainer-Agnostic Metrics

Cristian Munoz, Kleyton da Costa, Bernardo Modenesi, Adriano Koshiyama

TL;DR

Six distinct model-agnostic metrics designed to quantify the extent to which model predictions can be explained are developed, allowing for a comprehensive evaluation of how models generate their outputs.

Abstract

The rapid integration of artificial intelligence (AI) into various industries has introduced new challenges in governance and regulation, particularly regarding the understanding of complex AI systems. A critical demand from decision-makers is the ability to explain the results of machine learning models, which is essential for fostering trust and ensuring ethical AI practices. In this paper, we develop six distinct model-agnostic metrics designed to quantify the extent to which model predictions can be explained. These metrics measure different aspects of model explainability, ranging from local importance, global importance, and surrogate predictions, allowing for a comprehensive evaluation of how models generate their outputs. Furthermore, by computing our metrics, we can rank models in terms of explainability criteria such as importance concentration and consistency, prediction fluctuation, and surrogate fidelity and stability, offering a valuable tool for selecting models based not only on accuracy but also on transparency. We demonstrate the practical utility of these metrics on classification and regression tasks, and integrate these metrics into an existing Python package for public use.

Evaluating Explainability in Machine Learning Predictions through Explainer-Agnostic Metrics

TL;DR

Six distinct model-agnostic metrics designed to quantify the extent to which model predictions can be explained are developed, allowing for a comprehensive evaluation of how models generate their outputs.

Abstract

The rapid integration of artificial intelligence (AI) into various industries has introduced new challenges in governance and regulation, particularly regarding the understanding of complex AI systems. A critical demand from decision-makers is the ability to explain the results of machine learning models, which is essential for fostering trust and ensuring ethical AI practices. In this paper, we develop six distinct model-agnostic metrics designed to quantify the extent to which model predictions can be explained. These metrics measure different aspects of model explainability, ranging from local importance, global importance, and surrogate predictions, allowing for a comprehensive evaluation of how models generate their outputs. Furthermore, by computing our metrics, we can rank models in terms of explainability criteria such as importance concentration and consistency, prediction fluctuation, and surrogate fidelity and stability, offering a valuable tool for selecting models based not only on accuracy but also on transparency. We demonstrate the practical utility of these metrics on classification and regression tasks, and integrate these metrics into an existing Python package for public use.
Paper Structure (29 sections, 17 equations, 4 figures, 2 tables)

This paper contains 29 sections, 17 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: A simplified representation of explainer-agnostic metrics (EAMEX) framework
  • Figure 2: Comparison of different feature importance analyses on the Adult Dataset for an ML model.
  • Figure 3: Comparison of different feature importance analyses on the US-Crime Dataset for an ML model.
  • Figure 4: Overall analysis of explainer-agnostic metrics for binary classification (a) and regression (b) tasks. The color areas represent global importance, local importance, and surrogate importance metrics. All metrics' reference values were standardized to facilitate interpretation, so a value of 1 is considered the reference or desired value for all metrics.

Theorems & Definitions (9)

  • Definition 3.1: Feature Importance Divergence
  • Definition 3.2: $\alpha$-Feature Importance
  • Definition 3.3: Fluctuation Ratio
  • Definition 3.4: Rank Alignment
  • Definition 3.5: Rank Consistency
  • Definition 3.6: Importance Stability
  • Definition 3.7: Performance Degradation
  • Definition 3.8: Surrogate Fidelity
  • Definition 3.9: Surrogate Feature Stability