Table of Contents
Fetching ...

GAIA: Delving into Gradient-based Attribution Abnormality for Out-of-distribution Detection

Jinggang Chen, Junjie Li, Xiaoyang Qu, Jianzong Wang, Jiguang Wan, Jing Xiao

TL;DR

GAIA introduces attribution abnormality as a signal for out-of-distribution detection by analyzing gradient-based explanations. It defines two forms, Channel-Wise Average Abnormality and Zero-Deflation Abnormality, and aggregates them across layers via a simple, training-free post-hoc framework. The method achieves strong improvements on CIFAR and ImageNet-1K benchmarks in FPR95 and AUROC compared with advanced baselines, while remaining parameter-free and data-agnostic for ID data. The work offers a Taylor-expansion viewpoint on attribution, discusses gradient-based limitations for transformers, and points to practical implications for reliable AI systems.

Abstract

Detecting out-of-distribution (OOD) examples is crucial to guarantee the reliability and safety of deep neural networks in real-world settings. In this paper, we offer an innovative perspective on quantifying the disparities between in-distribution (ID) and OOD data -- analyzing the uncertainty that arises when models attempt to explain their predictive decisions. This perspective is motivated by our observation that gradient-based attribution methods encounter challenges in assigning feature importance to OOD data, thereby yielding divergent explanation patterns. Consequently, we investigate how attribution gradients lead to uncertain explanation outcomes and introduce two forms of abnormalities for OOD detection: the zero-deflation abnormality and the channel-wise average abnormality. We then propose GAIA, a simple and effective approach that incorporates Gradient Abnormality Inspection and Aggregation. The effectiveness of GAIA is validated on both commonly utilized (CIFAR) and large-scale (ImageNet-1k) benchmarks. Specifically, GAIA reduces the average FPR95 by 23.10% on CIFAR10 and by 45.41% on CIFAR100 compared to advanced post-hoc methods.

GAIA: Delving into Gradient-based Attribution Abnormality for Out-of-distribution Detection

TL;DR

GAIA introduces attribution abnormality as a signal for out-of-distribution detection by analyzing gradient-based explanations. It defines two forms, Channel-Wise Average Abnormality and Zero-Deflation Abnormality, and aggregates them across layers via a simple, training-free post-hoc framework. The method achieves strong improvements on CIFAR and ImageNet-1K benchmarks in FPR95 and AUROC compared with advanced baselines, while remaining parameter-free and data-agnostic for ID data. The work offers a Taylor-expansion viewpoint on attribution, discusses gradient-based limitations for transformers, and points to practical implications for reliable AI systems.

Abstract

Detecting out-of-distribution (OOD) examples is crucial to guarantee the reliability and safety of deep neural networks in real-world settings. In this paper, we offer an innovative perspective on quantifying the disparities between in-distribution (ID) and OOD data -- analyzing the uncertainty that arises when models attempt to explain their predictive decisions. This perspective is motivated by our observation that gradient-based attribution methods encounter challenges in assigning feature importance to OOD data, thereby yielding divergent explanation patterns. Consequently, we investigate how attribution gradients lead to uncertain explanation outcomes and introduce two forms of abnormalities for OOD detection: the zero-deflation abnormality and the channel-wise average abnormality. We then propose GAIA, a simple and effective approach that incorporates Gradient Abnormality Inspection and Aggregation. The effectiveness of GAIA is validated on both commonly utilized (CIFAR) and large-scale (ImageNet-1k) benchmarks. Specifically, GAIA reduces the average FPR95 by 23.10% on CIFAR10 and by 45.41% on CIFAR100 compared to advanced post-hoc methods.
Paper Structure (15 sections, 12 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 15 sections, 12 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Motivation of our work. Gradient-based attribution algorithms use attribution gradients to explain where models look for predicting final outputs. An intriguing question is: when encountering OOD sample $X_{\text{out}}$ whose label falls outside the in-distribution label space $Y_{\text{in}}$, how does the model interpret its overconfident prediction? In order to unearth uncertainty from the explanatory result, we conduct our research by inspecting the abnormalities in attribution gradients and then aggregate them for OOD detection.
  • Figure 2: Demonstration of the attribution abnormality from gradient-based weights. The toy experiment is conducted on ResNet34 with four blocks trained on CIFAR10. We select four attribution layers from different blocks and calculate the average attribution gradients for each channel.
  • Figure 3: Left (a): Visualization of attribution gradients on feature maps. Right (b): Proportion of non-zero values across different channels. Each data point represents one single channel.
  • Figure 4: Ablation studies on Frobenius norm of matrix $\bm{\Lambda$.
  • Figure 5: The distribution of the OOD scores in three settings (inner only, output only and fusion). All scores are non-negative for comparison.