Table of Contents
Fetching ...

From Confusion to Clarity: ProtoScore -- A Framework for Evaluating Prototype-Based XAI

Helena Monke, Benjamin Sae-Chew, Benjamin Fresz, Marco F. Huber

TL;DR

ProtoScore addresses the lack of objective benchmarks for prototype-based XAI, particularly in time-series contexts, by integrating the Co-12 properties into a unified, automated evaluation framework. It defines latent-space preliminaries, extends the Co-12 properties with prototype-specific metrics, and provides concrete formulas to quantify correctness, consistency, continuity, contrastivity, covariate complexity, compactness, confidence, input completeness, and latent-space cohesion. Through exemplary use cases and multi-dataset experiments, the framework demonstrates how MAP and MSP prototype methods fare across diverse metrics, guiding practitioners in method selection while highlighting trade-offs and dataset dependencies. The framework emphasizes reproducibility, reduces reliance on costly user studies, and offers a path toward richer, human-centered validation by connecting quantitative metrics with eventual user studies.

Abstract

The complexity and opacity of neural networks (NNs) pose significant challenges, particularly in high-stakes fields such as healthcare, finance, and law, where understanding decision-making processes is crucial. To address these issues, the field of explainable artificial intelligence (XAI) has developed various methods aimed at clarifying AI decision-making, thereby facilitating appropriate trust and validating the fairness of outcomes. Among these methods, prototype-based explanations offer a promising approach that uses representative examples to elucidate model behavior. However, a critical gap exists regarding standardized benchmarks to objectively compare prototype-based XAI methods, especially in the context of time series data. This lack of reliable benchmarks results in subjective evaluations, hindering progress in the field. We aim to establish a robust framework, ProtoScore, for assessing prototype-based XAI methods across different data types with a focus on time series data, facilitating fair and comprehensive evaluations. By integrating the Co-12 properties of Nauta et al., this framework allows for effectively comparing prototype methods against each other and against other XAI methods, ultimately assisting practitioners in selecting appropriate explanation methods while minimizing the costs associated with user studies. All code is publicly available at https://github.com/HelenaM23/ProtoScore .

From Confusion to Clarity: ProtoScore -- A Framework for Evaluating Prototype-Based XAI

TL;DR

ProtoScore addresses the lack of objective benchmarks for prototype-based XAI, particularly in time-series contexts, by integrating the Co-12 properties into a unified, automated evaluation framework. It defines latent-space preliminaries, extends the Co-12 properties with prototype-specific metrics, and provides concrete formulas to quantify correctness, consistency, continuity, contrastivity, covariate complexity, compactness, confidence, input completeness, and latent-space cohesion. Through exemplary use cases and multi-dataset experiments, the framework demonstrates how MAP and MSP prototype methods fare across diverse metrics, guiding practitioners in method selection while highlighting trade-offs and dataset dependencies. The framework emphasizes reproducibility, reduces reliance on costly user studies, and offers a path toward richer, human-centered validation by connecting quantitative metrics with eventual user studies.

Abstract

The complexity and opacity of neural networks (NNs) pose significant challenges, particularly in high-stakes fields such as healthcare, finance, and law, where understanding decision-making processes is crucial. To address these issues, the field of explainable artificial intelligence (XAI) has developed various methods aimed at clarifying AI decision-making, thereby facilitating appropriate trust and validating the fairness of outcomes. Among these methods, prototype-based explanations offer a promising approach that uses representative examples to elucidate model behavior. However, a critical gap exists regarding standardized benchmarks to objectively compare prototype-based XAI methods, especially in the context of time series data. This lack of reliable benchmarks results in subjective evaluations, hindering progress in the field. We aim to establish a robust framework, ProtoScore, for assessing prototype-based XAI methods across different data types with a focus on time series data, facilitating fair and comprehensive evaluations. By integrating the Co-12 properties of Nauta et al., this framework allows for effectively comparing prototype methods against each other and against other XAI methods, ultimately assisting practitioners in selecting appropriate explanation methods while minimizing the costs associated with user studies. All code is publicly available at https://github.com/HelenaM23/ProtoScore .

Paper Structure

This paper contains 31 sections, 24 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Architecture of a typical prototype network following Li_Liu_Chen_Rudin_2018. It consists of an autoencoder and a classification network. In the prototype layer, the data samples are represented, e.g., as weighted distances of their latent representations to the prototypes. Other prototype methods may have slightly different structures. The benchmark focuses primarily on the latent space marked with the red rectangular.
  • Figure 2: Exemplary prototypes for the MSP model with index 5. On the left, the prototype representing the abnormal class is displayed, while on the right, the prototype for the normal class, where specific conditions were detected, is shown.
  • Figure 3: Dimension-reduced representation of the latent space for the MAP model with index 1. Note that the dimensionality reduction distorts distances, causing centroids and prototypes to appear outside their respective clusters. Small dots represent input data, with lines connecting each prototype (circle) to its corresponding cluster centroid (square). Points are color-coded by class.
  • Figure 4: Without outliers
  • Figure 5: With Outliers