Table of Contents
Fetching ...

AIDE: Antithetical, Intent-based, and Diverse Example-Based Explanations

Ikhtiyor Nematov, Dimitris Sacharidis, Tomer Sagi, Katja Hose

TL;DR

AIDE addresses the need for explanations that reveal contrasting reasons behind model predictions by identifying influential training samples. It uses a first-order influence-function approximation to compute sample influence and organizes explanations into four quadrants (S, SC, O, OC) to provide contrastive views aligned with three user intents. AIDE employs proximity- and diversity-aware, greedy sampling with an IQR-based outlier filter and a weighted objective to select a compact, informative set of samples for each intent. Across text and image classification tasks, AIDE demonstrates improved faithfulness and continuity over baselines, and a user study supports the value of intent-aware, contrastive explanations for human-AI collaboration.

Abstract

For many use-cases, it is often important to explain the prediction of a black-box model by identifying the most influential training data samples. Existing approaches lack customization for user intent and often provide a homogeneous set of explanation samples, failing to reveal the model's reasoning from different angles. In this paper, we propose AIDE, an approach for providing antithetical (i.e., contrastive), intent-based, diverse explanations for opaque and complex models. AIDE distinguishes three types of explainability intents: interpreting a correct, investigating a wrong, and clarifying an ambiguous prediction. For each intent, AIDE selects an appropriate set of influential training samples that support or oppose the prediction either directly or by contrast. To provide a succinct summary, AIDE uses diversity-aware sampling to avoid redundancy and increase coverage of the training data. We demonstrate the effectiveness of AIDE on image and text classification tasks, in three ways: quantitatively, assessing correctness and continuity; qualitatively, comparing anecdotal evidence from AIDE and other example-based approaches; and via a user study, evaluating multiple aspects of AIDE. The results show that AIDE addresses the limitations of existing methods and exhibits desirable traits for an explainability method.

AIDE: Antithetical, Intent-based, and Diverse Example-Based Explanations

TL;DR

AIDE addresses the need for explanations that reveal contrasting reasons behind model predictions by identifying influential training samples. It uses a first-order influence-function approximation to compute sample influence and organizes explanations into four quadrants (S, SC, O, OC) to provide contrastive views aligned with three user intents. AIDE employs proximity- and diversity-aware, greedy sampling with an IQR-based outlier filter and a weighted objective to select a compact, informative set of samples for each intent. Across text and image classification tasks, AIDE demonstrates improved faithfulness and continuity over baselines, and a user study supports the value of intent-aware, contrastive explanations for human-AI collaboration.

Abstract

For many use-cases, it is often important to explain the prediction of a black-box model by identifying the most influential training data samples. Existing approaches lack customization for user intent and often provide a homogeneous set of explanation samples, failing to reveal the model's reasoning from different angles. In this paper, we propose AIDE, an approach for providing antithetical (i.e., contrastive), intent-based, diverse explanations for opaque and complex models. AIDE distinguishes three types of explainability intents: interpreting a correct, investigating a wrong, and clarifying an ambiguous prediction. For each intent, AIDE selects an appropriate set of influential training samples that support or oppose the prediction either directly or by contrast. To provide a succinct summary, AIDE uses diversity-aware sampling to avoid redundancy and increase coverage of the training data. We demonstrate the effectiveness of AIDE on image and text classification tasks, in three ways: quantitatively, assessing correctness and continuity; qualitatively, comparing anecdotal evidence from AIDE and other example-based approaches; and via a user study, evaluating multiple aspects of AIDE. The results show that AIDE addresses the limitations of existing methods and exhibits desirable traits for an explainability method.
Paper Structure (14 sections, 1 theorem, 12 equations, 11 figures, 10 tables)

This paper contains 14 sections, 1 theorem, 12 equations, 11 figures, 10 tables.

Key Result

Lemma 1

In binary classification with logistic loss, the influence of a training point ${\bm{z}}$ to the predictions of ${\bm{z}}_t = ({\bm{x}}_t, y_t)$ and ${\bm{z}}_t' = ({\bm{x}}_t, 1-y_t)$ is related as follows:

Figures (11)

  • Figure 1: Explanations for a spam classification task, depicting a correctly classified spam message and its influence-based explanations generated by IF and AIDE.
  • Figure 2: Explanations to clarify an ambiguous prediction.
  • Figure 3: Continuity in terms of explanation similarity vs. instance pair similarity in spam dataset.
  • Figure 4: Continuity in terms of explanation similarity vs. instance pair similarity in image dataset.
  • Figure 5: Explanations to interpret a correct prediction.
  • ...and 6 more figures

Theorems & Definitions (2)

  • Lemma 1
  • proof