Manifold Integrated Gradients: Riemannian Geometry for Feature Attribution

Eslam Zaher; Maciej Trzaskowski; Quan Nguyen; Fred Roosta

Manifold Integrated Gradients: Riemannian Geometry for Feature Attribution

Eslam Zaher, Maciej Trzaskowski, Quan Nguyen, Fred Roosta

TL;DR

The paper tackles the reliability issues of Integrated Gradients (IG) by introducing Manifold Integrated Gradients (MIG), which performs attribution along geodesics on a latent Riemannian manifold learned with a convolutional VAE. MIG maps geodesics from the latent space to the data space via a generator $g$ and integrates classifier gradients along these curved paths, yielding perceptually aligned explanations and improved robustness to adversarial attributional attacks. The key contributions are (1) formulating MIG, (2) deriving its theoretical grounding in Riemannian geometry for feature attribution, and (3) demonstrating superior perceptual quality and robustness on real-image datasets compared with IG and related methods. This data-manifold–aware approach promises safer, more reliable explanations for vision models and could impact high-stakes domains such as medical imaging by reducing noise and vulnerability in gradient-based explanations.

Abstract

In this paper, we dive into the reliability concerns of Integrated Gradients (IG), a prevalent feature attribution method for black-box deep learning models. We particularly address two predominant challenges associated with IG: the generation of noisy feature visualizations for vision models and the vulnerability to adversarial attributional attacks. Our approach involves an adaptation of path-based feature attribution, aligning the path of attribution more closely to the intrinsic geometry of the data manifold. Our experiments utilise deep generative models applied to several real-world image datasets. They demonstrate that IG along the geodesics conforms to the curved geometry of the Riemannian data manifold, generating more perceptually intuitive explanations and, subsequently, substantially increasing robustness to targeted attributional attacks.

Manifold Integrated Gradients: Riemannian Geometry for Feature Attribution

TL;DR

and integrates classifier gradients along these curved paths, yielding perceptually aligned explanations and improved robustness to adversarial attributional attacks. The key contributions are (1) formulating MIG, (2) deriving its theoretical grounding in Riemannian geometry for feature attribution, and (3) demonstrating superior perceptual quality and robustness on real-image datasets compared with IG and related methods. This data-manifold–aware approach promises safer, more reliable explanations for vision models and could impact high-stakes domains such as medical imaging by reducing noise and vulnerability in gradient-based explanations.

Abstract

Paper Structure (22 sections, 20 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 22 sections, 20 equations, 5 figures, 3 tables, 1 algorithm.

Introduction
Background
Path-based Feature Attribution Methods
Attributional Attacks
VAEs for Generative Manifold Learning
Riemannian Geometry for Feature Attribution
Latent Space Geometry in Deep Generative Models
Integrated Gradients on the Data Manifold
Experiments
Geodesic Paths of Attribution
Perceptual Attribution Maps along the Geodesics
Robustness to Targeted Attributional Attacks
Quantitative Analysis
Metrics
Explanation Infidelity (INFD) FidelitySensitivityExplanations_2019.
...and 7 more sections

Figures (5)

Figure 1: Schematic of our Setup: The underlying image data manifold is learned using a convolutional VAE. The latent space corresponds to a Riemannian manifold where the geodesic path (shown in red) between two points represents the shortest path in such curved geometry. The linear path (shown in blue) doesn't conform to the intrinsic geometry of the manifold and deviates into regions out of the manifold. Reconstructions from the VAE along with the labels are used to train a classifier, and the geodesic path is used as the path of attribution in our MIG as opposed to the linear path in the image space used in IG.
Figure 2: The surface model implied by the smooth generator function $g$ mapping from the latent space $\mathcal{Z}$ to the data space $\mathcal{X}$. In this example, the latent manifold $\mathcal{M}$ is a one-dimensional embedded submanifold of $\mathbb{R}^{2}$ and the images lie on a two-dimensional embedded submanifold of $\mathbb{R}^{3}$. The geodesic on the latent manifold is mapped to a smooth curve on the data manifold, respecting the underlying geometry.
Figure 3: Mapped geodesic interpolation in MIG vs. linear interpolation in IG. (a) contrasts the smoothness of MIG's smooth path interpolants against IG's linear path from a black baseline. (b) displays a classifier response curves for each image on the paths, with MIG's smooth path (red) having a gradual response as key features show later on the path and IG's linear path (blue) showing rapid escalation, with a wide saturation region. (c) shows the corresponding feature visualizations. MIG produces more perceptually aligned and less noisy feature visualizations compared to IG.
Figure 4: Comparison of Feature Attribution Methods. Presented are feature maps from various methods—Saliency, Gradients $\times$ Input, GuidedBackprop, IG, Smooth IG, EIG, as well as our proposed MIG. As shown, MIG addresses the IG's noise limitation and surpasses other methods, producing distinctly clearer and perceptually more aligned visualizations. In the last row, the similarity between EIG and MIG indicates that the path of attribution in MIG passes through a nearly flat region on the data manifold, and hence a linear interpolation path employed by EIG can closely approximate the mapped geodesic path in MIG for this particular image.
Figure 5: MIG vs. IG under targeted attributional attacks. The figure displays examples of an original input image alongside a target image and an adversarial attributional attack designed to exploit the IG's linear path of attribution. IG's vulnerability is evident as it generates adversarial feature maps that erroneously mimic the target maps. MIG maintains perceptually consistent and noise-resistant feature visualizations for the adversarial examples, closely resembling those of the original input. Each row was generated based on a different classifier's backbone, VGG-16, ResNet18, InceptionV1, respectively.

Manifold Integrated Gradients: Riemannian Geometry for Feature Attribution

TL;DR

Abstract

Manifold Integrated Gradients: Riemannian Geometry for Feature Attribution

Authors

TL;DR

Abstract

Table of Contents

Figures (5)