Table of Contents
Fetching ...

Multi-Step Knowledge Interaction Analysis via Rank-2 Subspace Disentanglement

Sekh Mainul Islam, Pepa Atanasova, Isabelle Augenstein

TL;DR

This work tackles how LLMs integrate Parametric Knowledge (PK) and Context Knowledge (CK) when generating Natural Language Explanations (NLEs). It introduces a rank-2 projection subspace that disentangles PK and CK contributions across multi-step NLE sequences, addressing limitations of prior rank-1 approaches. Across four QA datasets and three open-weight LLMs, it shows the rank-2 framework captures diverse interaction types (Supportive, Complementary, Conflicting) and reveals dynamics: NLEs initially rely on PK, CK alignment grows for conflicting cases, and Chain-of-Thought prompting shifts toward CK. The findings offer a mechanistic, internal signal for detecting hallucination and grounding fidelity, suggesting avenues for controllable steering of PK–CK balance to improve factual consistency in generation.

Abstract

Natural Language Explanations (NLEs) describe how Large Language Models (LLMs) make decisions, drawing on both external Context Knowledge (CK) and Parametric Knowledge (PK) stored in model weights. Understanding their interaction is key to assessing the grounding of NLEs, yet it remains underexplored. Prior work has largely examined only single-step generation, typically the final answer, and has modelled PK and CK interaction only as a binary choice in a rank-1 subspace. This overlooks richer forms of interaction, such as complementary or supportive knowledge. We propose a novel rank-2 projection subspace that disentangles PK and CK contributions more accurately and use it for the first multi-step analysis of knowledge interactions across longer NLE sequences. Experiments on four QA datasets and three open-weight instruction-tuned LLMs show that diverse knowledge interactions are poorly represented in a rank-1 subspace but are effectively captured in our rank-2 formulation. Our multi-step analysis reveals that hallucinated NLEs align strongly with the PK direction, context-faithful ones balance PK and CK, and Chain-of-Thought prompting for NLEs shifts generated NLEs toward CK by reducing PK reliance. This work provides the first framework for systematic studies of multi-step knowledge interactions in LLMs through a richer rank-2 subspace disentanglement. Code and data: https://github.com/copenlu/pk-ck-knowledge-disentanglement.

Multi-Step Knowledge Interaction Analysis via Rank-2 Subspace Disentanglement

TL;DR

This work tackles how LLMs integrate Parametric Knowledge (PK) and Context Knowledge (CK) when generating Natural Language Explanations (NLEs). It introduces a rank-2 projection subspace that disentangles PK and CK contributions across multi-step NLE sequences, addressing limitations of prior rank-1 approaches. Across four QA datasets and three open-weight LLMs, it shows the rank-2 framework captures diverse interaction types (Supportive, Complementary, Conflicting) and reveals dynamics: NLEs initially rely on PK, CK alignment grows for conflicting cases, and Chain-of-Thought prompting shifts toward CK. The findings offer a mechanistic, internal signal for detecting hallucination and grounding fidelity, suggesting avenues for controllable steering of PK–CK balance to improve factual consistency in generation.

Abstract

Natural Language Explanations (NLEs) describe how Large Language Models (LLMs) make decisions, drawing on both external Context Knowledge (CK) and Parametric Knowledge (PK) stored in model weights. Understanding their interaction is key to assessing the grounding of NLEs, yet it remains underexplored. Prior work has largely examined only single-step generation, typically the final answer, and has modelled PK and CK interaction only as a binary choice in a rank-1 subspace. This overlooks richer forms of interaction, such as complementary or supportive knowledge. We propose a novel rank-2 projection subspace that disentangles PK and CK contributions more accurately and use it for the first multi-step analysis of knowledge interactions across longer NLE sequences. Experiments on four QA datasets and three open-weight instruction-tuned LLMs show that diverse knowledge interactions are poorly represented in a rank-1 subspace but are effectively captured in our rank-2 formulation. Our multi-step analysis reveals that hallucinated NLEs align strongly with the PK direction, context-faithful ones balance PK and CK, and Chain-of-Thought prompting for NLEs shifts generated NLEs toward CK by reducing PK reliance. This work provides the first framework for systematic studies of multi-step knowledge interactions in LLMs through a richer rank-2 subspace disentanglement. Code and data: https://github.com/copenlu/pk-ck-knowledge-disentanglement.

Paper Structure

This paper contains 20 sections, 1 theorem, 8 equations, 16 figures, 1 table.

Key Result

Theorem 1

Let the hidden representation $\vec{\mathbf{h_i}}$ for the input $x_i$ at the sequence step $i$ is decomposed as where $\vec{\mathbf{u}}_{CK},\vec{\mathbf{u}}_{PK}$ are orthonormal directions corresponding to context and parametric knowledge, $c_i,p_i \in \mathbb{R}$ are their contributions, and $\boldsymbol{\xi}_i$ is noise orthogonal to their span. A rank-1 probe with vector $\vec{\mathbf{v}}$

Figures (16)

  • Figure 1: Llama-3.1-8B-Instruct model combines parametric (green) and contextual (red) knowledge to generate NLEs. Projection onto a learned low-rank subspace $\mathbf{P}$ disentangles their contributions -- rank-1 discards richer interactions, while rank-2 separates complementary and conflicting components.
  • Figure 2: Kernel Density Estimate (KDE) of the PK-CK subspace component $\langle \vec{\mathbf{u}}^{T}, \vec{\mathbf{h}}_{i} \rangle$ across different knowledge interaction types for four QA datasets using Mistral-7B-Instruct-v0.3. The split noise denotes cases where answers from individual knowledge sources agree with each other but differ from the final answer, i.e., $a(q, c) = a(a, \varepsilon)$ and $a(a, c) \neq a$.
  • Figure 3: Cumulative explained variance ($EV_r$) at rank(r) from the three models using the four QA datasets. At rank-2, it reaches $1.0$ value, indicating sufficiency in capturing different knowledge interaction variants.
  • Figure 4: Patchscope on OpenBookQA from Meta-Llama-3.1-8B-Instruct. a) Activation patching on $\mathcal{D}_w^{(b \rightarrow p)}$. b) Activation patching on $\mathcal{D}_w^{(b \rightarrow c)}$.
  • Figure 5: Individual PK-CK contribution in generating the answer token $a$ for all the datsets from Meta-Llama-3.1-8B-Instruct model.
  • ...and 11 more figures

Theorems & Definitions (2)

  • Theorem 1: Non-identifiability under rank-1
  • proof