Table of Contents
Fetching ...

Generalized Attention Flow: Feature Attribution for Transformer Models via Maximum Flow

Behrooz Azarkhalili, Maxwell Libbrecht

TL;DR

This paper tackles the problem of extracting reliable feature attributions from Transformer models by moving beyond attention weights to a principled network-flow approach. It introduces Generalized Attention Flow (GAF), which builds an Information Tensor from attention and its gradients and solves a barrier-regularized maximum flow problem to obtain unique, Shapley-valued attributions. The authors prove that the barrier-regularized outputs satisfy the Shapley axioms and demonstrate through extensive benchmarks that a variant of GAF often outperforms existing attribution methods on sequence classification tasks. An open-source Python package is provided to compute these attributions for encoder-only Transformer models, enabling broader application and evaluation in NLP settings. Overall, the approach offers a theoretically grounded and practically effective framework for interpreting Transformer decisions.

Abstract

This paper introduces Generalized Attention Flow (GAF), a novel feature attribution method for Transformer-based models to address the limitations of current approaches. By extending Attention Flow and replacing attention weights with the generalized Information Tensor, GAF integrates attention weights, their gradients, the maximum flow problem, and the barrier method to enhance the performance of feature attributions. The proposed method exhibits key theoretical properties and mitigates the shortcomings of prior techniques that rely solely on simple aggregation of attention weights. Our comprehensive benchmarking on sequence classification tasks demonstrates that a specific variant of GAF consistently outperforms state-of-the-art feature attribution methods in most evaluation settings, providing a more reliable interpretation of Transformer model outputs.

Generalized Attention Flow: Feature Attribution for Transformer Models via Maximum Flow

TL;DR

This paper tackles the problem of extracting reliable feature attributions from Transformer models by moving beyond attention weights to a principled network-flow approach. It introduces Generalized Attention Flow (GAF), which builds an Information Tensor from attention and its gradients and solves a barrier-regularized maximum flow problem to obtain unique, Shapley-valued attributions. The authors prove that the barrier-regularized outputs satisfy the Shapley axioms and demonstrate through extensive benchmarks that a variant of GAF often outperforms existing attribution methods on sequence classification tasks. An open-source Python package is provided to compute these attributions for encoder-only Transformer models, enabling broader application and evaluation in NLP settings. Overall, the approach offers a theoretically grounded and practically effective framework for interpreting Transformer decisions.

Abstract

This paper introduces Generalized Attention Flow (GAF), a novel feature attribution method for Transformer-based models to address the limitations of current approaches. By extending Attention Flow and replacing attention weights with the generalized Information Tensor, GAF integrates attention weights, their gradients, the maximum flow problem, and the barrier method to enhance the performance of feature attributions. The proposed method exhibits key theoretical properties and mitigates the shortcomings of prior techniques that rely solely on simple aggregation of attention weights. Our comprehensive benchmarking on sequence classification tasks demonstrates that a specific variant of GAF consistently outperforms state-of-the-art feature attribution methods in most evaluation settings, providing a more reliable interpretation of Transformer model outputs.

Paper Structure

This paper contains 32 sections, 4 theorems, 19 equations, 11 figures, 6 tables, 2 algorithms.

Key Result

Theorem 2.1

For any strictly convex barrier function $\psi(\bm{f})$, convex function $\xi(\bm{f})$, and $\mu>0$, there exists a unique optimal point $\bm{f}^*_{\mu}$. Furthermore, $\lim_{\mu \to 0} \bm{f}^*_{\mu} = \bm{f}^*$, indicating that for any arbitrary $\epsilon > 0$, we can select a sufficiently small $

Figures (11)

  • Figure 1: Schematics overview of Generalized Attention Flow created using \ref{['algo:1']} and \ref{['algo:2']}.
  • Figure 2: Initial network flow to be used in our proposed method, the multi-commodity flow with multiple sources and targets, and the network flow for MCC problems.
  • Figure 3: Overview of how the proposed method computes the unique optimal flow using the log barrier method, attention weights, and their gradients in Transformers.
  • Figure 4: Network flows and optimal flows generated by \ref{['algo:1']} and \ref{['algo:2']}. The optimal flows computed using \ref{['algo:1']} and \ref{['algo:2']} are not equivalent.
  • Figure 5: Normalized feature attributions for Transformer's input layer and different information tensors.
  • ...and 6 more figures

Theorems & Definitions (10)

  • Definition 2.1: Minimum Cost Circulation
  • Remark 2.1
  • Theorem 2.1
  • Corollary 3.1
  • Definition 3.1: Shapley values
  • Theorem 3.1: Log Barrier Regularization of Generalized Attention Flow Outcomes Shapley Values
  • Corollary 3.2
  • Definition A.1: Network Flow
  • proof : \ref{['cor:1']}
  • proof : \ref{['th:2']}