Table of Contents
Fetching ...

Understanding and Patching Compositional Reasoning in LLMs

Zhaoyi Li, Gangwei Jiang, Hong Xie, Linqi Song, Defu Lian, Ying Wei

TL;DR

The paper interrogates why LLMs falter on compositional reasoning, showing that implicit reasoning signals arise in intermediate layers and causally influence final predictions. It uses Logit Lens to reveal these signals and an intervention to demonstrate their causal role, then locates key MHSA modules via causal mediation-inspired analysis. The authors introduce CREME, a lightweight patching method that edits MHSA outputs to insert corrective information, achieving strong improvements over baselines and generalizing to paraphrased and related queries while limiting effects on irrelevant inputs. This work advances mechanistic interpretability and offers a practical, generalizable approach to autonomously enhancing compositional reasoning in LLMs.

Abstract

LLMs have marked a revolutonary shift, yet they falter when faced with compositional reasoning tasks. Our research embarks on a quest to uncover the root causes of compositional reasoning failures of LLMs, uncovering that most of them stem from the improperly generated or leveraged implicit reasoning results. Inspired by our empirical findings, we resort to Logit Lens and an intervention experiment to dissect the inner hidden states of LLMs. This deep dive reveals that implicit reasoning results indeed surface within middle layers and play a causative role in shaping the final explicit reasoning results. Our exploration further locates multi-head self-attention (MHSA) modules within these layers, which emerge as the linchpins in accurate generation and leveraing of implicit reasoning results. Grounded on the above findings, we develop CREME, a lightweight method to patch errors in compositional reasoning via editing the located MHSA modules. Our empirical evidence stands testament to CREME's effectiveness, paving the way for autonomously and continuously enhancing compositional reasoning capabilities in language models.

Understanding and Patching Compositional Reasoning in LLMs

TL;DR

The paper interrogates why LLMs falter on compositional reasoning, showing that implicit reasoning signals arise in intermediate layers and causally influence final predictions. It uses Logit Lens to reveal these signals and an intervention to demonstrate their causal role, then locates key MHSA modules via causal mediation-inspired analysis. The authors introduce CREME, a lightweight patching method that edits MHSA outputs to insert corrective information, achieving strong improvements over baselines and generalizing to paraphrased and related queries while limiting effects on irrelevant inputs. This work advances mechanistic interpretability and offers a practical, generalizable approach to autonomously enhancing compositional reasoning in LLMs.

Abstract

LLMs have marked a revolutonary shift, yet they falter when faced with compositional reasoning tasks. Our research embarks on a quest to uncover the root causes of compositional reasoning failures of LLMs, uncovering that most of them stem from the improperly generated or leveraged implicit reasoning results. Inspired by our empirical findings, we resort to Logit Lens and an intervention experiment to dissect the inner hidden states of LLMs. This deep dive reveals that implicit reasoning results indeed surface within middle layers and play a causative role in shaping the final explicit reasoning results. Our exploration further locates multi-head self-attention (MHSA) modules within these layers, which emerge as the linchpins in accurate generation and leveraing of implicit reasoning results. Grounded on the above findings, we develop CREME, a lightweight method to patch errors in compositional reasoning via editing the located MHSA modules. Our empirical evidence stands testament to CREME's effectiveness, paving the way for autonomously and continuously enhancing compositional reasoning capabilities in language models.
Paper Structure (56 sections, 13 equations, 13 figures, 6 tables)

This paper contains 56 sections, 13 equations, 13 figures, 6 tables.

Figures (13)

  • Figure 1: An example of a typical compositional reasoning error pattern :Short-Cut(Section \ref{['sec4:identify error types']}). Before patching, LLMs take short-cut reasoning by directly binding “The nationality” and “C. Auguste Dupin” (a fictional French detective) to incorrectly predict “France”. After patching, LLMs tend to firstly bind “the creator of” and “C. Auguste Dupin” to generate “Edgar Allen Poe” (implicit reasoning result) and then correctly predict “U.S.A.” (explicit reasoning result).
  • Figure 2: Logit Lens results of examples of three error types. Comp is the result for compositional two-hop query; Reference is the result for the corresponding second-hop query (as the reference for the compositional query). red and blue lines trace the implicit and explicit results respectively. y-axis represents the inspecting value (Eqn. \ref{['eq:logit_lens']}).
  • Figure 3: Logit Lens inspecting results with LLaMA-2-7B. (a) refers to the averaged result for inputs of compositional two-hop queries and (b) refers to the averaged result for second-hop queries. x-axis refers to the layer; y-axis refers averaged Logit Lens values after min-max normalization (i.e., the original values are linearly mapped to $[0,1]$). Yellow line and blue line refers to implicit results and explicit results respectively.
  • Figure 4: Intervention experiment: Brighter color indicates the intervention effect is more significant. In each subfigure, the upper row refers to the experiment group and the lower row refers to the comparison group. Note that for better visualization, we clip the effect value ($\le0$) to $0$ for both of the experiment and comparison groups.
  • Figure 5: AIE for replacements. "last”: last token; "subject”: last subject token; "mlp”: replace the MLP output; "attn”: replace the MHSA output. Brighter positions indicate replacements of larger effect (more important).
  • ...and 8 more figures