Table of Contents
Fetching ...

Attention Consistency for LLMs Explanation

Tian Lan, Jinyuan Xu, Xue He, Jenq-Neng Hwang, Lei Li

TL;DR

MACS introduces a lightweight, inference-time heuristic for token attribution in decoder-only Transformer models by measuring the cross-layer consistency of the strongest input-attention links. Unlike full-aggregation methods, MACS uses layer-wise max-pooling, a floor bias, and a multiplicative accumulation across layers, followed by z-score normalization to yield clear, sparse attributions with real-time efficiency. Empirical results on QA (SQuAD 2.0 subset) show MACS delivering higher ranking of ground-truth answer tokens (AUC-PR) and competitive faithfulness (SRG) compared to stronger baselines, while requiring far less VRAM and preserving throughput. A preliminary VQA study suggests MACS can extend to multimodal Transformers by analyzing attention in cross-modal layers, highlighting its potential as a general, efficient diagnostic tool for interpretability in diverse transformer architectures.

Abstract

Understanding the decision-making processes of large language models (LLMs) is essential for their trustworthy development and deployment. However, current interpretability methods often face challenges such as low resolution and high computational cost. To address these limitations, we propose the \textbf{Multi-Layer Attention Consistency Score (MACS)}, a novel, lightweight, and easily deployable heuristic for estimating the importance of input tokens in decoder-based models. MACS measures contributions of input tokens based on the consistency of maximal attention. Empirical evaluations demonstrate that MACS achieves a favorable trade-off between interpretability quality and computational efficiency, showing faithfulness comparable to complex techniques with a 22\% decrease in VRAM usage and 30\% reduction in latency.

Attention Consistency for LLMs Explanation

TL;DR

MACS introduces a lightweight, inference-time heuristic for token attribution in decoder-only Transformer models by measuring the cross-layer consistency of the strongest input-attention links. Unlike full-aggregation methods, MACS uses layer-wise max-pooling, a floor bias, and a multiplicative accumulation across layers, followed by z-score normalization to yield clear, sparse attributions with real-time efficiency. Empirical results on QA (SQuAD 2.0 subset) show MACS delivering higher ranking of ground-truth answer tokens (AUC-PR) and competitive faithfulness (SRG) compared to stronger baselines, while requiring far less VRAM and preserving throughput. A preliminary VQA study suggests MACS can extend to multimodal Transformers by analyzing attention in cross-modal layers, highlighting its potential as a general, efficient diagnostic tool for interpretability in diverse transformer architectures.

Abstract

Understanding the decision-making processes of large language models (LLMs) is essential for their trustworthy development and deployment. However, current interpretability methods often face challenges such as low resolution and high computational cost. To address these limitations, we propose the \textbf{Multi-Layer Attention Consistency Score (MACS)}, a novel, lightweight, and easily deployable heuristic for estimating the importance of input tokens in decoder-based models. MACS measures contributions of input tokens based on the consistency of maximal attention. Empirical evaluations demonstrate that MACS achieves a favorable trade-off between interpretability quality and computational efficiency, showing faithfulness comparable to complex techniques with a 22\% decrease in VRAM usage and 30\% reduction in latency.

Paper Structure

This paper contains 36 sections, 10 equations, 3 figures, 5 tables, 1 algorithm.

Figures (3)

  • Figure 1: Peak VRAM usage (a) and Throughput (b) against context size for different XAI methods. denotes Out-of-Memory (OOM) errors. indicates prohibitive inference times (>10 mins or near-zero throughput). Baseline is the inference without any XAI method. MACS maintains high efficiency across context lengths.
  • Figure 2: During generation, MACS dynamically highlights in the image the regions corresponding to the text as it’s being produced. The generated text is: “The image shows a young child and a white dog, sitting together in a grassy outdoor setting. The child is wearing a red cap, a red and gray jacket, and has a backpack on.” Source: PerholsCopyright
  • Figure 3: MACS demonstrating anticipatory attention on a QA example. The heatmap shows MACS attribution scores on the input context prior to the model generating the answer "France". High consistency scores (darker red) on "France" in the context indicate MACS identifies the answer span before its generation by the model (Q: Question, G: Generated tokens).