Manifold Integrated Gradients: Riemannian Geometry for Feature Attribution
Eslam Zaher, Maciej Trzaskowski, Quan Nguyen, Fred Roosta
TL;DR
The paper tackles the reliability issues of Integrated Gradients (IG) by introducing Manifold Integrated Gradients (MIG), which performs attribution along geodesics on a latent Riemannian manifold learned with a convolutional VAE. MIG maps geodesics from the latent space to the data space via a generator $g$ and integrates classifier gradients along these curved paths, yielding perceptually aligned explanations and improved robustness to adversarial attributional attacks. The key contributions are (1) formulating MIG, (2) deriving its theoretical grounding in Riemannian geometry for feature attribution, and (3) demonstrating superior perceptual quality and robustness on real-image datasets compared with IG and related methods. This data-manifold–aware approach promises safer, more reliable explanations for vision models and could impact high-stakes domains such as medical imaging by reducing noise and vulnerability in gradient-based explanations.
Abstract
In this paper, we dive into the reliability concerns of Integrated Gradients (IG), a prevalent feature attribution method for black-box deep learning models. We particularly address two predominant challenges associated with IG: the generation of noisy feature visualizations for vision models and the vulnerability to adversarial attributional attacks. Our approach involves an adaptation of path-based feature attribution, aligning the path of attribution more closely to the intrinsic geometry of the data manifold. Our experiments utilise deep generative models applied to several real-world image datasets. They demonstrate that IG along the geodesics conforms to the curved geometry of the Riemannian data manifold, generating more perceptually intuitive explanations and, subsequently, substantially increasing robustness to targeted attributional attacks.
