Table of Contents
Fetching ...

Dual Path Attribution: Efficient Attribution for SwiGLU-Transformers through Layer-Wise Target Propagation

Lasse Marten Jantsch, Dong-Jae Koh, Seonghyeon Lee, Young-Kyoon Suh

Abstract

Understanding the internal mechanisms of transformer-based large language models (LLMs) is crucial for their reliable deployment and effective operation. While recent efforts have yielded a plethora of attribution methods attempting to balance faithfulness and computational efficiency, dense component attribution remains prohibitively expensive. In this work, we introduce Dual Path Attribution (DPA), a novel framework that faithfully traces information flow on the frozen transformer in one forward and one backward pass without requiring counterfactual examples. DPA analytically decomposes and linearizes the computational structure of the SwiGLU Transformers into distinct pathways along which it propagates a targeted unembedding vector to receive the effective representation at each residual position. This target-centric propagation achieves O(1) time complexity with respect to the number of model components, scaling to long input sequences and dense component attribution. Extensive experiments on standard interpretability benchmarks demonstrate that DPA achieves state-of-the-art faithfulness and unprecedented efficiency compared to existing baselines.

Dual Path Attribution: Efficient Attribution for SwiGLU-Transformers through Layer-Wise Target Propagation

Abstract

Understanding the internal mechanisms of transformer-based large language models (LLMs) is crucial for their reliable deployment and effective operation. While recent efforts have yielded a plethora of attribution methods attempting to balance faithfulness and computational efficiency, dense component attribution remains prohibitively expensive. In this work, we introduce Dual Path Attribution (DPA), a novel framework that faithfully traces information flow on the frozen transformer in one forward and one backward pass without requiring counterfactual examples. DPA analytically decomposes and linearizes the computational structure of the SwiGLU Transformers into distinct pathways along which it propagates a targeted unembedding vector to receive the effective representation at each residual position. This target-centric propagation achieves O(1) time complexity with respect to the number of model components, scaling to long input sequences and dense component attribution. Extensive experiments on standard interpretability benchmarks demonstrate that DPA achieves state-of-the-art faithfulness and unprecedented efficiency compared to existing baselines.
Paper Structure (61 sections, 47 equations, 13 figures, 13 tables)

This paper contains 61 sections, 47 equations, 13 figures, 13 tables.

Figures (13)

  • Figure 1: Overview of the Dual Path Attribution (DPA) framework for efficient input and component attribution. DPA operates in two stages: (1) One-pass forward execution, in which the model processes the input once while caching activations required for the back propagation; and (2) top-down target propagation, where the unembedding vector of the targeted token is recursively propagated backward through Transformer modules to identify an effective target at each residual position.
  • Figure 2: Performance comparison with baseline attribution methods on Llama-3.1-8B-Instruct under different top input token masking strategies. Our approach shows lower Top-$k$ disruption and higher Top-$k$ recovery, indicating more accurate identification of important components.
  • Figure 3: Performance comparison with baseline attribution methods on Llama-3.1-8B-Instruct under different top model component masking strategies. Our approach shows lower disruption and higher recovery, indicating more accurate identification of important components.
  • Figure 4: Qualitative attribution examples. Red highlights denote tokens that positively contribute to the target, while blue denotes tokens that hinder it. As shown on the IMDb dataset (left), DPA more precisely distinguishes between helpful and harmful context compared to baseline methods. The right panel illustrates DPA's attribution across different configurations ($\mu$).
  • Figure 5: Sensitivity analysis of Dual Path Attribution (DPA) scaling configurations. We evaluate the impact of varying attribution parameters to prioritize different information pathways.
  • ...and 8 more figures