Table of Contents
Fetching ...

Breaking the Chain: A Causal Analysis of LLM Faithfulness to Intermediate Structures

Oleg Somov, Mikhail Chaichuk, Mikhail Seleznyov, Alexander Panchenko, Elena Tutubalina

Abstract

Schema-guided reasoning pipelines ask LLMs to produce explicit intermediate structures -- rubrics, checklists, verification queries -- before committing to a final decision. But do these structures causally determine the output, or merely accompany it? We introduce a causal evaluation protocol that makes this directly measurable: by selecting tasks where a deterministic function maps intermediate structures to decisions, every controlled edit implies a unique correct output. Across eight models and three benchmarks, models appear self-consistent with their own intermediate structures but fail to update predictions after intervention in up to 60% of cases -- revealing that apparent faithfulness is fragile once the intermediate structure changes. When derivation of the final decision from the structure is delegated to an external tool, this fragility largely disappears; however, prompts which ask to prioritize the intermediate structure over the original input do not materially close the gap. Overall, intermediate structures in schema-guided pipelines function as influential context rather than stable causal mediators.

Breaking the Chain: A Causal Analysis of LLM Faithfulness to Intermediate Structures

Abstract

Schema-guided reasoning pipelines ask LLMs to produce explicit intermediate structures -- rubrics, checklists, verification queries -- before committing to a final decision. But do these structures causally determine the output, or merely accompany it? We introduce a causal evaluation protocol that makes this directly measurable: by selecting tasks where a deterministic function maps intermediate structures to decisions, every controlled edit implies a unique correct output. Across eight models and three benchmarks, models appear self-consistent with their own intermediate structures but fail to update predictions after intervention in up to 60% of cases -- revealing that apparent faithfulness is fragile once the intermediate structure changes. When derivation of the final decision from the structure is delegated to an external tool, this fragility largely disappears; however, prompts which ask to prioritize the intermediate structure over the original input do not materially close the gap. Overall, intermediate structures in schema-guided pipelines function as influential context rather than stable causal mediators.
Paper Structure (47 sections, 5 equations, 4 figures, 5 tables, 1 algorithm)

This paper contains 47 sections, 5 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: Illustration of a causal intervention on intermediate structures (rubrics). A model generates a rubric used to compute the final score. By editing the rubric (e.g., changing Q2 from True to False), we test whether the prediction is causally mediated by it. If the score updates accordingly, the model is faithful; otherwise it relies on hidden shortcuts.
  • Figure 2: Causal framing of intervention on intermediate structure. $X$: input (task + answer); $M$: intermediate structure (e.g., filled rubric); $Y$: final decision (grade). Faithful mediation (solid green paths): $X$ influences $Y$ through $M$; intervening via $\operatorname{do}(M{=}M^\star)$ changes $Y$. Unfaithful (dashed red): a direct $X \!\to\! Y$ path bypasses $M$; altering $M$ leaves $Y$ unchanged, revealing the model ignores the mediator.
  • Figure 3: Symmetry analysis. The X-axis shows faithfulness under Correction interventions (where an incorrect mediator is replaced with a correct one), and the Y-axis shows faithfulness under Counterfactual interventions (and vice versa). Models with fewer than 10 generations in either subset are excluded due to noisy estimates.
  • Figure 4: The bar plot shows the measured faithfulness gap for each model on the three datasets before and after tool use. The green arrows highlight this reduction, indicating the drop from the original gap to the post–tool-use gap for each model.