Table of Contents
Fetching ...

Evaluating Explanation Without Ground Truth in Interpretable Machine Learning

Fan Yang, Mengnan Du, Xia Hu

TL;DR

This work formalizes the problem of evaluating explanations in interpretable machine learning without ground truth, crystallizing three core properties: generalizability, fidelity, and persuasibility. It surveys existing methodologies across intrinsic and posthoc, global and local explanations, and proposes a unified hierarchical framework to benchmark explanations for developers and end-users. The authors also identify open problems—especially for local generalizability, posthoc fidelity, and global persuasibility—and discuss limitations such as causal reasoning, completeness, and novelty considerations. The paper aims to advance standardized benchmarking in IML and guide future research toward robust, user-centered explanations.

Abstract

Interpretable Machine Learning (IML) has become increasingly important in many real-world applications, such as autonomous cars and medical diagnosis, where explanations are significantly preferred to help people better understand how machine learning systems work and further enhance their trust towards systems. However, due to the diversified scenarios and subjective nature of explanations, we rarely have the ground truth for benchmark evaluation in IML on the quality of generated explanations. Having a sense of explanation quality not only matters for assessing system boundaries, but also helps to realize the true benefits to human users in practical settings. To benchmark the evaluation in IML, in this article, we rigorously define the problem of evaluating explanations, and systematically review the existing efforts from state-of-the-arts. Specifically, we summarize three general aspects of explanation (i.e., generalizability, fidelity and persuasibility) with formal definitions, and respectively review the representative methodologies for each of them under different tasks. Further, a unified evaluation framework is designed according to the hierarchical needs from developers and end-users, which could be easily adopted for different scenarios in practice. In the end, open problems are discussed, and several limitations of current evaluation techniques are raised for future explorations.

Evaluating Explanation Without Ground Truth in Interpretable Machine Learning

TL;DR

This work formalizes the problem of evaluating explanations in interpretable machine learning without ground truth, crystallizing three core properties: generalizability, fidelity, and persuasibility. It surveys existing methodologies across intrinsic and posthoc, global and local explanations, and proposes a unified hierarchical framework to benchmark explanations for developers and end-users. The authors also identify open problems—especially for local generalizability, posthoc fidelity, and global persuasibility—and discuss limitations such as causal reasoning, completeness, and novelty considerations. The paper aims to advance standardized benchmarking in IML and guide future research toward robust, user-centered explanations.

Abstract

Interpretable Machine Learning (IML) has become increasingly important in many real-world applications, such as autonomous cars and medical diagnosis, where explanations are significantly preferred to help people better understand how machine learning systems work and further enhance their trust towards systems. However, due to the diversified scenarios and subjective nature of explanations, we rarely have the ground truth for benchmark evaluation in IML on the quality of generated explanations. Having a sense of explanation quality not only matters for assessing system boundaries, but also helps to realize the true benefits to human users in practical settings. To benchmark the evaluation in IML, in this article, we rigorously define the problem of evaluating explanations, and systematically review the existing efforts from state-of-the-arts. Specifically, we summarize three general aspects of explanation (i.e., generalizability, fidelity and persuasibility) with formal definitions, and respectively review the representative methodologies for each of them under different tasks. Further, a unified evaluation framework is designed according to the hierarchical needs from developers and end-users, which could be easily adopted for different scenarios in practice. In the end, open problems are discussed, and several limitations of current evaluation techniques are raised for future explorations.

Paper Structure

This paper contains 23 sections, 6 figures.

Figures (6)

  • Figure 1: Illustration of the IML techniques. We compare the two different pipelines between machine learning (ML) and IML. It is worth noting that IML model is capable of providing specific reasons for particular machine decisions, while ML model may simply provide the prediction results with probability scores. Here, we employ the image classification task as an example, where IML model could tell which part of the image contributes the animal to a husky while ML model may only tell the overall confidence towards a husky classification result.
  • Figure 2: Tendency of the IML research in recent years. In particular, we present the number of research publications related to IML from 2010 to 2018, and plot the trendline according to the statistics. The relevant numerics are collected from Google Scholar, with the key words "interpretable machine learning". We believe the actual numbers are even larger than the provided, since some other terms, such as "explainable", which are closely related to IML, are ignored during collection. From the results, we can see that IML related publication has been increasing exponentially, and much more attention has been paid for this field.
  • Figure 3: A two-dimensional categorization for explanations in IML, covering interpretation scope and interpretation manner. According to the two-dimensional standard, we can divide explanations into four different groups: (a) intrinsic-global; (b) intrinsic-local; (c) posthoc-global; (d) posthoc-local. For each category, we attach a representative example for illustration. In particular, we employ decision tree as the example for intrinsic-global explanations, attention mechanism for intrinsic-local ones, mimic learning for posthoc-global ones, and instance heatmap for posthoc-local ones.
  • Figure 4: Three general properties for explanations in IML, including generalizability, fidelity and persuasibility. Each property essentially corresponds to one specific aspect in evaluation. Generalizability focuses on the generalization power of explanation. Fidelity focuses on the faithfulness degree of explanation. Persuasibility focuses on the usefulness degree of explanation.
  • Figure 5: Illustration of the IML evaluation. Basically, IML evaluation can be divided into model evaluation and explanation evaluation. For model evaluation, we focus on the generalizability of the system, and evaluate the quality of prediction. For explanation evaluation, we focus on the predictability, fidelity, persuasibility, and evaluate the quality of explanation. Besides, there are also some special properties that are entangled with both model and explanation. We list robustness, capability and certainty here for instance. In this paper, we specifically focus on the aspects which are related to explanation evaluation.
  • ...and 1 more figures