Table of Contents
Fetching ...

The Susceptibility of Example-Based Explainability Methods to Class Outliers

Ikhtiyor Nematov, Dimitris Sacharidis, Tomer Sagi, Katja Hose

TL;DR

The paper addresses how class outliers—high-loss, ambiguous training examples—affect local, example-based explanations for black-box models. It reformulates evaluation metrics around relevance, distinguishability, and correctness, and introduces a framework that treats outliers as potentially informative rather than solely suppressible. Through experiments on SMS Spam text and a dog-vs-fish image task with four explainers (IF, RIF, DM, TraceIn) across explanation sizes $N \in \{2,5,10\}$, the study shows that outliers degrade relevance and distinguishability, and that suppression strategies can hurt correctness. The findings highlight the need for robust, outlier-aware explainability methods and metrics that recognize the value of informative, ambiguous instances in model explanations.

Abstract

This study explores the impact of class outliers on the effectiveness of example-based explainability methods for black-box machine learning models. We reformulate existing explainability evaluation metrics, such as correctness and relevance, specifically for example-based methods, and introduce a new metric, distinguishability. Using these metrics, we highlight the shortcomings of current example-based explainability methods, including those who attempt to suppress class outliers. We conduct experiments on two datasets, a text classification dataset and an image classification dataset, and evaluate the performance of four state-of-the-art explainability methods. Our findings underscore the need for robust techniques to tackle the challenges posed by class outliers.

The Susceptibility of Example-Based Explainability Methods to Class Outliers

TL;DR

The paper addresses how class outliers—high-loss, ambiguous training examples—affect local, example-based explanations for black-box models. It reformulates evaluation metrics around relevance, distinguishability, and correctness, and introduces a framework that treats outliers as potentially informative rather than solely suppressible. Through experiments on SMS Spam text and a dog-vs-fish image task with four explainers (IF, RIF, DM, TraceIn) across explanation sizes , the study shows that outliers degrade relevance and distinguishability, and that suppression strategies can hurt correctness. The findings highlight the need for robust, outlier-aware explainability methods and metrics that recognize the value of informative, ambiguous instances in model explanations.

Abstract

This study explores the impact of class outliers on the effectiveness of example-based explainability methods for black-box machine learning models. We reformulate existing explainability evaluation metrics, such as correctness and relevance, specifically for example-based methods, and introduce a new metric, distinguishability. Using these metrics, we highlight the shortcomings of current example-based explainability methods, including those who attempt to suppress class outliers. We conduct experiments on two datasets, a text classification dataset and an image classification dataset, and evaluate the performance of four state-of-the-art explainability methods. Our findings underscore the need for robust techniques to tackle the challenges posed by class outliers.
Paper Structure (12 sections, 5 equations, 3 figures, 2 tables)

This paper contains 12 sections, 5 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The two most popular images that are returned as explanations for four different explainability methods.
  • Figure 2: Popularity probability density function (image classification)
  • Figure 3: Popularity vs. Loss (image classification)