Is Sparse Attention more Interpretable?

Clara Meister; Stefan Lazov; Isabelle Augenstein; Ryan Cotterell

Is Sparse Attention more Interpretable?

Clara Meister, Stefan Lazov, Isabelle Augenstein, Ryan Cotterell

TL;DR

The paper questions the common claim that sparse attention improves interpretability by examining whether sparsity yields faithful explanations when attention operates on internal representations rather than inputs. It introduces an entropy-based dispersion measure for input influence and evaluates LSTM and Transformer models on three text-classification tasks, observing weak links between inputs and co-indexed representations and no robust mapping from sparse attention to a small set of influential inputs. The results show that increasing sparsity tends to reduce the correlation between attention and input feature importance and does not produce sparse input explanations, suggesting sparsity may actually hinder interpretability. Overall, the findings argue against assuming sparse attention enhances interpretability and emphasize the need for concrete evidence before adopting sparsity-based explanations in NLP models.

Abstract

Sparse attention has been claimed to increase model interpretability under the assumption that it highlights influential inputs. Yet the attention distribution is typically over representations internal to the model rather than the inputs themselves, suggesting this assumption may not have merit. We build on the recent work exploring the interpretability of attention; we design a set of experiments to help us understand how sparsity affects our ability to use attention as an explainability tool. On three text classification tasks, we verify that only a weak relationship between inputs and co-indexed intermediate representations exists -- under sparse attention and otherwise. Further, we do not find any plausible mappings from sparse attention distributions to a sparse set of influential inputs through other avenues. Rather, we observe in this setting that inducing sparsity may make it less plausible that attention can be used as a tool for understanding model behavior.

Is Sparse Attention more Interpretable?

TL;DR

Abstract

Is Sparse Attention more Interpretable?

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)