BONES: a Benchmark fOr Neural Estimation of Shapley values

Davide Napolitano; Luca Cagliero

BONES: a Benchmark fOr Neural Estimation of Shapley values

Davide Napolitano, Luca Cagliero

TL;DR

Shapley Values (SVs) are foundational for explaining black-box predictions, but exact SV computation is intractable due to the exponential number of feature subsets, $2^d$. BONES addresses reproducibility and standardization gaps by delivering a modular, open-source benchmark that co-locates neural and traditional SV estimators, benchmark datasets, ground-truth generation, evaluation metrics, and visualization tools. Key evaluation components include $L1$ and $L2$ distances, Kendall correlation, a comparative score $P = 1 - \frac{d_i - d_{min}}{d_{max} - d_{min}}$, and image-specific Inclusion/Exclusion AUC. Case studies on Monks UCI and ImageNette demonstrate BONES' ability to enable fair comparisons of accuracy, efficiency, and robustness across tabular and image modalities. BONES is designed to be modality-agnostic and extensible, supporting easy integration of new neural SV estimators, datasets, and evaluation procedures to advance reproducible XAI research.

Abstract

Shapley Values are concepts established for eXplainable AI. They are used to explain black-box predictive models by quantifying the features' contributions to the model's outcomes. Since computing the exact Shapley Values is known to be computationally intractable on real-world datasets, neural estimators have emerged as alternative, more scalable approaches to get approximated Shapley Values estimates. However, experiments with neural estimators are currently hard to replicate as algorithm implementations, explainer evaluators, and results visualizations are neither standardized nor promptly usable. To bridge this gap, we present BONES, a new benchmark focused on neural estimation of Shapley Value. It provides researchers with a suite of state-of-the-art neural and traditional estimators, a set of commonly used benchmark datasets, ad hoc modules for training black-box models, as well as specific functions to easily compute the most popular evaluation metrics and visualize results. The purpose is to simplify XAI model usage, evaluation, and comparison. In this paper, we showcase BONES results and visualizations for XAI model benchmarking on both tabular and image data. The open-source library is available at the following link: https://github.com/DavideNapolitano/BONES.

BONES: a Benchmark fOr Neural Estimation of Shapley values

TL;DR

Shapley Values (SVs) are foundational for explaining black-box predictions, but exact SV computation is intractable due to the exponential number of feature subsets,

. BONES addresses reproducibility and standardization gaps by delivering a modular, open-source benchmark that co-locates neural and traditional SV estimators, benchmark datasets, ground-truth generation, evaluation metrics, and visualization tools. Key evaluation components include

and

distances, Kendall correlation, a comparative score

, and image-specific Inclusion/Exclusion AUC. Case studies on Monks UCI and ImageNette demonstrate BONES' ability to enable fair comparisons of accuracy, efficiency, and robustness across tabular and image modalities. BONES is designed to be modality-agnostic and extensible, supporting easy integration of new neural SV estimators, datasets, and evaluation procedures to advance reproducible XAI research.

Abstract

Paper Structure (20 sections, 1 equation, 5 figures, 2 tables)

This paper contains 20 sections, 1 equation, 5 figures, 2 tables.

Introduction
Related Works
XAI tools
XAI models
Traditional Approaches
Neural Approaches
The BONES Benchmark
Datasets
Explainers
Black-Box Models
Evaluation functions
Estimation error
Computational cost
Comparative analysis
Visualization
...and 5 more sections

Figures (5)

Figure 1: Example of bar plot comparing the global Shapley Values estimated by six explainers on the Monks dataset against the ground truth (i.e., Exact).
Figure 2: Quadrant plot combining computational times and a L2 distance metric.
Figure 3: Examplse of visualizations plots showing the variations of the computational times with the number of dataset features (upper plot) and the number of dataset samples (bottom).
Figure 4: Image plot: comparison of the Shapley Values' masks computed by the different explainers on a ImageNette sample.
Figure 5: AUC Exclusion (left) and Inclusion (right) computed on ImageNette.

BONES: a Benchmark fOr Neural Estimation of Shapley values

TL;DR

Abstract

BONES: a Benchmark fOr Neural Estimation of Shapley values

Authors

TL;DR

Abstract

Table of Contents

Figures (5)