Table of Contents
Fetching ...

Attention Flows are Shapley Value Explanations

Kawin Ethayarajh, Dan Jurafsky

TL;DR

The paper investigates how attention-based explanations in NLP relate to Shapley Values from cooperative game theory. It proves that plain attention weights and leave-one-out scores are not Shapley Values (except in degenerate cases) and proposes attention flows, a max-flow derived post-processing, which under same-layer conditions can be a Shapley Value. This provides a theoretically grounded, faithful interpretation mechanism that complements gradient-based methods and traditional attentions. The work also discusses practical applications, limitations, and avenues for future work in making Shapley-based explanations tractable in NLP.

Abstract

Shapley Values, a solution to the credit assignment problem in cooperative game theory, are a popular type of explanation in machine learning, having been used to explain the importance of features, embeddings, and even neurons. In NLP, however, leave-one-out and attention-based explanations still predominate. Can we draw a connection between these different methods? We formally prove that -- save for the degenerate case -- attention weights and leave-one-out values cannot be Shapley Values. $\textit{Attention flow}$ is a post-processed variant of attention weights obtained by running the max-flow algorithm on the attention graph. Perhaps surprisingly, we prove that attention flows are indeed Shapley Values, at least at the layerwise level. Given the many desirable theoretical qualities of Shapley Values -- which has driven their adoption among the ML community -- we argue that NLP practitioners should, when possible, adopt attention flow explanations alongside more traditional ones.

Attention Flows are Shapley Value Explanations

TL;DR

The paper investigates how attention-based explanations in NLP relate to Shapley Values from cooperative game theory. It proves that plain attention weights and leave-one-out scores are not Shapley Values (except in degenerate cases) and proposes attention flows, a max-flow derived post-processing, which under same-layer conditions can be a Shapley Value. This provides a theoretically grounded, faithful interpretation mechanism that complements gradient-based methods and traditional attentions. The work also discusses practical applications, limitations, and avenues for future work in making Shapley-based explanations tractable in NLP.

Abstract

Shapley Values, a solution to the credit assignment problem in cooperative game theory, are a popular type of explanation in machine learning, having been used to explain the importance of features, embeddings, and even neurons. In NLP, however, leave-one-out and attention-based explanations still predominate. Can we draw a connection between these different methods? We formally prove that -- save for the degenerate case -- attention weights and leave-one-out values cannot be Shapley Values. is a post-processed variant of attention weights obtained by running the max-flow algorithm on the attention graph. Perhaps surprisingly, we prove that attention flows are indeed Shapley Values, at least at the layerwise level. Given the many desirable theoretical qualities of Shapley Values -- which has driven their adoption among the ML community -- we argue that NLP practitioners should, when possible, adopt attention flow explanations alongside more traditional ones.

Paper Structure

This paper contains 12 sections, 3 theorems, 4 equations, 1 figure.

Key Result

Proposition 1

If some player is attended to more than another, there is no TU-game $(N, v)$ for which attention weights are Shapley Values.

Figures (1)

  • Figure 1: The attention flow network for three tokens across three layers, with player nodes (red) and non-player nodes (blue). The payoff $v(N)$ is the total flow through the network. $\phi_i(v)$ is the total outgoing flow of player $i$. Note that if we remove player $i$, then the total flow will decrease by $\phi_i(v)$, but the outgoing flow of the other two players (red) will stay the same. In other words, the contribution of player $i$ to the total flow $v(N)$ is always $\phi_i(v)$; therefore, $\phi_i(v)$ is its Shapley Value. This construction is possible because the players are all in the same layer and therefore parallel; if one depended on another, then its outgoing flow could not be its Shapley Value.

Theorems & Definitions (11)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Definition 2.4
  • Definition 2.5
  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Proposition 3
  • ...and 1 more