Tangentially Aligned Integrated Gradients for User-Friendly Explanations
Lachlan Simpson, Federico Costanza, Kyle Millar, Adriel Cheng, Cheng-Chew Lim, Hong Gunn Chew
TL;DR
This work addresses the base-point sensitivity of Integrated Gradients by introducing tangential alignment, a criterion grounded in the manifold hypothesis that seeks explanations lying in the tangent space $T_{x}M$. It derives theoretical conditions via $H_{x}$ and $E_{x}$, provides a practical optimization to find the optimal base-point $\alpha_x^*$, and demonstrates Tangential IG on four image datasets using a tangent-space estimate from a convolutional autoencoder. Compared to standard base-points and three gradient explainability models, Tangential IG yields higher tangential-content $\mu_x$ and more perceptually meaningful attributions. The approach is general to any base-point attribution method and advances user-friendly, robust explanations for neural networks in vision tasks.
Abstract
Integrated gradients is prevalent within machine learning to address the black-box problem of neural networks. The explanations given by integrated gradients depend on a choice of base-point. The choice of base-point is not a priori obvious and can lead to drastically different explanations. There is a longstanding hypothesis that data lies on a low dimensional Riemannian manifold. The quality of explanations on a manifold can be measured by the extent to which an explanation for a point lies in its tangent space. In this work, we propose that the base-point should be chosen such that it maximises the tangential alignment of the explanation. We formalise the notion of tangential alignment and provide theoretical conditions under which a base-point choice will provide explanations lying in the tangent space. We demonstrate how to approximate the optimal base-point on several well-known image classification datasets. Furthermore, we compare the optimal base-point choice with common base-points and three gradient explainability models.
