Gradient based Feature Attribution in Explainable AI: A Technical Review

Yongjie Wang; Tong Zhang; Xu Guo; Zhiqi Shen

Gradient based Feature Attribution in Explainable AI: A Technical Review

Yongjie Wang, Tong Zhang, Xu Guo, Zhiqi Shen

TL;DR

This technical survey targets gradient-based feature attribution within Explainable AI, focusing on neural networks and the challenge of explaining opaque decisions. It introduces a novel four-group taxonomy for gradient-based explanations, surveys a broad set of methods from vanilla gradients to integrated gradients and bias-aware approaches, and highlights postprocessing techniques for denoising. The paper consolidates evaluation metrics spanning human-centric and objective fidelity tests, and discusses general and gradient-specific challenges that limit current explanations, such as noise, baseline dependence, and bias contributions. By mapping algorithmic developments to evaluation frameworks and outlining practical research directions, the work provides a roadmap for improving the reliability, faithfulness, and usability of gradient-based explanations in real-world AI systems.

Abstract

The surge in black-box AI models has prompted the need to explain the internal mechanism and justify their reliability, especially in high-stakes applications, such as healthcare and autonomous driving. Due to the lack of a rigorous definition of explainable AI (XAI), a plethora of research related to explainability, interpretability, and transparency has been developed to explain and analyze the model from various perspectives. Consequently, with an exhaustive list of papers, it becomes challenging to have a comprehensive overview of XAI research from all aspects. Considering the popularity of neural networks in AI research, we narrow our focus to a specific area of XAI research: gradient based explanations, which can be directly adopted for neural network models. In this review, we systematically explore gradient based explanation methods to date and introduce a novel taxonomy to categorize them into four distinct classes. Then, we present the essence of technique details in chronological order and underscore the evolution of algorithms. Next, we introduce both human and quantitative evaluations to measure algorithm performance. More importantly, we demonstrate the general challenges in XAI and specific challenges in gradient based explanations. We hope that this survey can help researchers understand state-of-the-art progress and their corresponding disadvantages, which could spark their interest in addressing these issues in future work.

Gradient based Feature Attribution in Explainable AI: A Technical Review

TL;DR

Abstract

Paper Structure (21 sections, 37 equations, 4 figures, 1 table)

This paper contains 21 sections, 37 equations, 4 figures, 1 table.

Introduction
Purpose of This Survey
Our Contributions
Research outline
Gradient based Feature Attribution
Preliminary
Vanilla Gradients based Explanation
Integrated Gradients based Explanation
Summary
Bias Gradients based Explanation
Postprocessing for Denoising
Summary
Evaluation metrics
Human Evaluation
Localization Tests
...and 6 more sections

Figures (4)

Figure 1: Taxonomy of Explainable AI according to guidotti2018survey. In this research, we focus on gradient based explanations in feature attribution.
Figure 2: Taxonomy of gradient based feature attribution.
Figure 3: The differences among backpropagation, deconvolutional network, guided backpropagation, and RectGrad lie in the implementation of ReLU in the backward pass.
Figure 4: The chronological evolution of gradients based explanations is depicted in the publication timeline.

Theorems & Definitions (1)

definition 1: Feature Attribution

Gradient based Feature Attribution in Explainable AI: A Technical Review

TL;DR

Abstract

Gradient based Feature Attribution in Explainable AI: A Technical Review

Authors

TL;DR

Abstract

Table of Contents

Figures (4)

Theorems & Definitions (1)