Table of Contents
Fetching ...

Knowing What You Cannot Explain: Learning to Reject Low-Quality Explanations

Luca Stradiotti, Dario Pesenti, Stefano Teso, Jesse Davis

Abstract

Learning to Reject (LtR) frameworks allow ML models to abstain from uncertain predictions and promote user trust. However, since current LtR strategies focus solely on predictive performance, they completely neglect explanation quality. Low-quality explanations -- whether they inaccurately reflect the model's reasoning or fail to satisfy users -- can severely compromise trust assessments and induce over-reliance on incorrect predictions. We argue that models should abstain from making a prediction when they cannot offer a satisfactory explanation for it and introduce a framework for learning to reject low-quality explanations (LtX) in which predictors are equipped with a rejector that evaluates the explanation quality. Focusing on popular attribution techniques, we propose REX (REjector of low-quality eXplanations), which learns a rejector from explanation quality labels combining machine-side judgments with explicit human annotations to assess explanation quality. Our empirical evaluation demonstrates that \method outperforms popular LtR strategies and baselines relying on isolated explanation metrics. Finally, to support future research, we publicly release a novel, larger-scale dataset of 1050 human-annotated machine explanations.

Knowing What You Cannot Explain: Learning to Reject Low-Quality Explanations

Abstract

Learning to Reject (LtR) frameworks allow ML models to abstain from uncertain predictions and promote user trust. However, since current LtR strategies focus solely on predictive performance, they completely neglect explanation quality. Low-quality explanations -- whether they inaccurately reflect the model's reasoning or fail to satisfy users -- can severely compromise trust assessments and induce over-reliance on incorrect predictions. We argue that models should abstain from making a prediction when they cannot offer a satisfactory explanation for it and introduce a framework for learning to reject low-quality explanations (LtX) in which predictors are equipped with a rejector that evaluates the explanation quality. Focusing on popular attribution techniques, we propose REX (REjector of low-quality eXplanations), which learns a rejector from explanation quality labels combining machine-side judgments with explicit human annotations to assess explanation quality. Our empirical evaluation demonstrates that \method outperforms popular LtR strategies and baselines relying on isolated explanation metrics. Finally, to support future research, we publicly release a novel, larger-scale dataset of 1050 human-annotated machine explanations.

Paper Structure

This paper contains 31 sections, 10 equations, 12 figures, 14 tables.

Figures (12)

  • Figure 1: Illustration of REX. LtR is unconcerned with the quality of machine explanations (left). REX instead addresses LtX, which requires to reject predictions that cannot be explained properly to stakeholders, improving trust assessment and down-stream decision quality (right).
  • Figure 2: REX rejects on average more low-quality explanations than all competitors. Average percentage of low quality explanations in the accepted and rejected set for all the considered strategies over the five datasets for $25$ rejection rates $\rho_\%$. For all the considered rejection rates, REX consistently rejects more low-quality explanations than all competitors.
  • Figure 3: Image from the user study illustrating the snapshot (left), the predicted probability of scoring (bottom) and the associated Kernel SHAP explanation (right). This suggests that the feature "distance to goal" slightly increases the probability, while "GK distance to goal line" decreases it.
  • Figure 4: REX rejects on average more low-quality explanations than all competitors over the user study data. Average percentage of low quality explanations in the accepted and rejected set for all the considered strategies over the user study data for $25$ rejection rates $\rho_\%$. For all the considered rejection rates, REX consistently rejects more low-quality explanations than all competitors.
  • Figure 5: Example of the first image of each trial
  • ...and 7 more figures

Theorems & Definitions (1)

  • Definition 1