Table of Contents
Fetching ...

Rethinking Saliency Maps: A Cognitive Human Aligned Taxonomy and Evaluation Framework for Explanations

Yehonatan Elisha, Seffi Cohen, Oren Barkan, Noam Koenigstein

TL;DR

Rethinking Saliency Maps introduces the RFxG taxonomy to align saliency explanations with human user questions along two axes: Reference Frame (pointwise vs contrastive) and Semantic Granularity (class vs group). It argues that existing evaluation mostly captures pointwise faithfulness and neglects contrastive reasoning and granularity, and proposes four faithfulness metrics (CCS, CGC, PGS, CGS) grounded in perturbation analyses. The paper provides a comprehensive evaluation across ten saliency methods, four architectures, and three datasets, showing that contrastive and group-aware explanations (notably IIA) better capture discriminative and semantic cues. It offers a practical framework to develop explanations that are faithful to model behavior and meaningful for users in high-stakes settings, with potential extensions to NLP and multimodal explanations.

Abstract

Saliency maps are widely used for visual explanations in deep learning, but a fundamental lack of consensus persists regarding their intended purpose and alignment with diverse user queries. This ambiguity hinders the effective evaluation and practical utility of explanation methods. We address this gap by introducing the Reference-Frame $\times$ Granularity (RFxG) taxonomy, a principled conceptual framework that organizes saliency explanations along two essential axes:Reference-Frame: Distinguishing between pointwise ("Why this prediction?") and contrastive ("Why this and not an alternative?") explanations. Granularity: Ranging from fine-grained class-level (e.g., "Why Husky?") to coarse-grained group-level (e.g., "Why Dog?") interpretations. Using the RFxG lens, we demonstrate critical limitations in existing evaluation metrics, which overwhelmingly prioritize pointwise faithfulness while neglecting contrastive reasoning and semantic granularity. To systematically assess explanation quality across both RFxG dimensions, we propose four novel faithfulness metrics. Our comprehensive evaluation framework applies these metrics to ten state-of-the-art saliency methods, four model architectures, and three datasets. By advocating a shift toward user-intent-driven evaluation, our work provides both the conceptual foundation and the practical tools necessary to develop visual explanations that are not only faithful to the underlying model behavior but are also meaningfully aligned with the complexity of human understanding and inquiry.

Rethinking Saliency Maps: A Cognitive Human Aligned Taxonomy and Evaluation Framework for Explanations

TL;DR

Rethinking Saliency Maps introduces the RFxG taxonomy to align saliency explanations with human user questions along two axes: Reference Frame (pointwise vs contrastive) and Semantic Granularity (class vs group). It argues that existing evaluation mostly captures pointwise faithfulness and neglects contrastive reasoning and granularity, and proposes four faithfulness metrics (CCS, CGC, PGS, CGS) grounded in perturbation analyses. The paper provides a comprehensive evaluation across ten saliency methods, four architectures, and three datasets, showing that contrastive and group-aware explanations (notably IIA) better capture discriminative and semantic cues. It offers a practical framework to develop explanations that are faithful to model behavior and meaningful for users in high-stakes settings, with potential extensions to NLP and multimodal explanations.

Abstract

Saliency maps are widely used for visual explanations in deep learning, but a fundamental lack of consensus persists regarding their intended purpose and alignment with diverse user queries. This ambiguity hinders the effective evaluation and practical utility of explanation methods. We address this gap by introducing the Reference-Frame Granularity (RFxG) taxonomy, a principled conceptual framework that organizes saliency explanations along two essential axes:Reference-Frame: Distinguishing between pointwise ("Why this prediction?") and contrastive ("Why this and not an alternative?") explanations. Granularity: Ranging from fine-grained class-level (e.g., "Why Husky?") to coarse-grained group-level (e.g., "Why Dog?") interpretations. Using the RFxG lens, we demonstrate critical limitations in existing evaluation metrics, which overwhelmingly prioritize pointwise faithfulness while neglecting contrastive reasoning and semantic granularity. To systematically assess explanation quality across both RFxG dimensions, we propose four novel faithfulness metrics. Our comprehensive evaluation framework applies these metrics to ten state-of-the-art saliency methods, four model architectures, and three datasets. By advocating a shift toward user-intent-driven evaluation, our work provides both the conceptual foundation and the practical tools necessary to develop visual explanations that are not only faithful to the underlying model behavior but are also meaningfully aligned with the complexity of human understanding and inquiry.

Paper Structure

This paper contains 31 sections, 8 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Our RFxG Explanation Axes. Note that there are also explanations between the points. For example: class-group contrastive questions like "Why Husky and not other Dogs?"
  • Figure 2: Saliency maps for a sport car using different types of questions that reflect our taxonomy: (a) Pointwise class. (b) Contrastive between two classes. (c) Contrastive between class and group.
  • Figure 3: Saliency maps for a sport car using different types of questions that reflect our taxonomy with different methods: (a) Class Contrastive. (b) Class-Group Contrastive. (c) Pointwise Group.