Table of Contents
Fetching ...

Better Explain Transformers by Illuminating Important Information

Linxin Song, Yan Cui, Ao Luo, Freddy Lecue, Irene Li

TL;DR

The paper tackles the challenge of explaining Transformer decisions by identifying and preserving important information while suppressing irrelevant signals during attribution. It introduces Mask-LRP, a post-hoc explanation method that refines Layer-wise Relevance Propagation by masking attention heads that focus on nonessential information, guided by syntactic and positional head masks. Empirical results on classification and question-answering tasks show consistent improvements over baselines in explanation quality, with ablations highlighting the detrimental effect of irrelevant information on LRP and visualizations revealing a progression from internal to interaction information across layers. The approach is model-agnostic and scalable to various Transformer architectures, offering a more faithful and actionable explanation of model behavior for debugging and trust.

Abstract

Transformer-based models excel in various natural language processing (NLP) tasks, attracting countless efforts to explain their inner workings. Prior methods explain Transformers by focusing on the raw gradient and attention as token attribution scores, where non-relevant information is often considered during explanation computation, resulting in confusing results. In this work, we propose highlighting the important information and eliminating irrelevant information by a refined information flow on top of the layer-wise relevance propagation (LRP) method. Specifically, we consider identifying syntactic and positional heads as important attention heads and focus on the relevance obtained from these important heads. Experimental results demonstrate that irrelevant information does distort output attribution scores and then should be masked during explanation computation. Compared to eight baselines on both classification and question-answering datasets, our method consistently outperforms with over 3\% to 33\% improvement on explanation metrics, providing superior explanation performance. Our anonymous code repository is available at: https://github.com/LinxinS97/Mask-LRP

Better Explain Transformers by Illuminating Important Information

TL;DR

The paper tackles the challenge of explaining Transformer decisions by identifying and preserving important information while suppressing irrelevant signals during attribution. It introduces Mask-LRP, a post-hoc explanation method that refines Layer-wise Relevance Propagation by masking attention heads that focus on nonessential information, guided by syntactic and positional head masks. Empirical results on classification and question-answering tasks show consistent improvements over baselines in explanation quality, with ablations highlighting the detrimental effect of irrelevant information on LRP and visualizations revealing a progression from internal to interaction information across layers. The approach is model-agnostic and scalable to various Transformer architectures, offering a more faithful and actionable explanation of model behavior for debugging and trust.

Abstract

Transformer-based models excel in various natural language processing (NLP) tasks, attracting countless efforts to explain their inner workings. Prior methods explain Transformers by focusing on the raw gradient and attention as token attribution scores, where non-relevant information is often considered during explanation computation, resulting in confusing results. In this work, we propose highlighting the important information and eliminating irrelevant information by a refined information flow on top of the layer-wise relevance propagation (LRP) method. Specifically, we consider identifying syntactic and positional heads as important attention heads and focus on the relevance obtained from these important heads. Experimental results demonstrate that irrelevant information does distort output attribution scores and then should be masked during explanation computation. Compared to eight baselines on both classification and question-answering datasets, our method consistently outperforms with over 3\% to 33\% improvement on explanation metrics, providing superior explanation performance. Our anonymous code repository is available at: https://github.com/LinxinS97/Mask-LRP
Paper Structure (32 sections, 14 equations, 13 figures, 6 tables)

This paper contains 32 sections, 14 equations, 13 figures, 6 tables.

Figures (13)

  • Figure 1: Distributions of the relative positions dependent for different syntactic relations in SST2.
  • Figure 2: Illustration of our method. Gradients and relevance are propagated through the Transformer block from the final layer to the first layer. We extract two types of important information during the LRP process in all blocks by identifying the important heads.
  • Figure 3: AOPC and LOdds scores of different methods in explaining $\text{BERT}_{\text{base}}$ against the corruption rate $k$ on SST-2. Note that higher AOPC and lower LOdds scores are better.
  • Figure 4: Comparison before and after corrupting the generated mask on SST-2. The blue line combines the solid line (average values) and shadow areas (standard deviation). The method's ability to explain becomes dropped after adding corruption.
  • Figure 5: Different types of important heads in $\text{BERT}_{\text{base}}$ model cross different dataset. The $x$-axis denotes the position of the attention head, while the $y$-axis is the position of the Transformer block. It is obvious that attention heads in previous blocks tend to focus on simple internal information (e.g., position), while attention heads in later blocks tend to focus on the complex interactions between tokens (e.g., syntactic relations).
  • ...and 8 more figures