Saliency-Bench: A Comprehensive Benchmark for Evaluating Visual Explanations

Yifei Zhang; James Song; Siyi Gu; Tianxu Jiang; Bo Pan; Guangji Bai; Liang Zhao

Saliency-Bench: A Comprehensive Benchmark for Evaluating Visual Explanations

Yifei Zhang, James Song, Siyi Gu, Tianxu Jiang, Bo Pan, Guangji Bai, Liang Zhao

TL;DR

Saliency-Bench addresses the fragmentation in evaluating visual explanations by introducing a standardized benchmark with eight diverse, annotated datasets and a unified evaluation pipeline that jointly measures alignment and faithfulness of saliency maps. It benchmarks multiple saliency methods, including GradCAM, GradCAM++, Integrated Gradients, InputXGradient, Occlusion, RISE, and ViT attention, across CNN and transformer architectures, revealing dataset- and model-dependent strengths and limitations. The work provides an easy-to-use API (xaibenchmark) to load data, generate explanations, and compute metrics, enabling reproducible comparisons and accelerating progress in XAI. By systematically analyzing both alignment (mIoU, Pointing Game) and faithfulness (iAUC) metrics, Saliency-Bench offers practical insights into how explanations correspond to ground-truth reasoning and model behavior, with implications for deploying trustworthy explanations in real-world tasks.

Abstract

Explainable AI (XAI) has gained significant attention for providing insights into the decision-making processes of deep learning models, particularly for image classification tasks through visual explanations visualized by saliency maps. Despite their success, challenges remain due to the lack of annotated datasets and standardized evaluation pipelines. In this paper, we introduce Saliency-Bench, a novel benchmark suite designed to evaluate visual explanations generated by saliency methods across multiple datasets. We curated, constructed, and annotated eight datasets, each covering diverse tasks such as scene classification, cancer diagnosis, object classification, and action classification, with corresponding ground-truth explanations. The benchmark includes a standardized and unified evaluation pipeline for assessing faithfulness and alignment of the visual explanation, providing a holistic visual explanation performance assessment. We benchmark these eight datasets with widely used saliency methods on different image classifier architectures to evaluate explanation quality. Additionally, we developed an easy-to-use API for automating the evaluation pipeline, from data accessing, and data loading, to result evaluation. The benchmark is available via our website: https://xaidataset.github.io.

Saliency-Bench: A Comprehensive Benchmark for Evaluating Visual Explanations

TL;DR

Abstract

Paper Structure (29 sections, 3 equations, 10 figures, 4 tables)

This paper contains 29 sections, 3 equations, 10 figures, 4 tables.

Introduction
Related Work
Saliency Methods for Visual Explanation
Evaluation Metrics for Saliency Methods
Datasets for XAI and Saliency Benchmarking
Task Formulation
A Comprehensive Benchmark for Evaluating Visual Explanations
Overview of Saliency-Bench
Dataset Collection
Standardized Evaluation Pipeline
Alignment-based metrics
Faithfulness-based metrics
Experiments
Experimental Settings
Results and Analysis
...and 14 more sections

Figures (10)

Figure 1: Example images from the eight datasets—Gender-XAI, Environment-XAI, Disease-XAI, Cancer-XAI, Security-XAI, Pet-XAI, Action-XAI, and Object-XAI—across different tasks. Each image is paired with a ground-truth explanation annotation.
Figure 2: Overview of Saliency-Bench: A Comprehensive Benchmark for Evaluating Visual Explanations.
Figure 3: Examples of mIoU and Pointing Game comparing saliency maps generated by Grad-CAM with ground-truth annotations on the Action-XAI dataset.
Figure 4: Qualitative results of visual explanation methods: (1) Original image; (2) Saliency map generated by GradCAM; (4) Saliency map generated by InputXGradient; (3) Generated by attention mechanisms of ViT-B/16.
Figure 5: xaibenchmark Python package installation.
...and 5 more figures

Saliency-Bench: A Comprehensive Benchmark for Evaluating Visual Explanations

TL;DR

Abstract

Saliency-Bench: A Comprehensive Benchmark for Evaluating Visual Explanations

Authors

TL;DR

Abstract

Table of Contents

Figures (10)