Measuring and Mitigating Post-hoc Rationalization in Reverse Chain-of-Thought Generation

Guangyue Peng; Zongchao Chen; Wen Luo; Yuntao Wen; Wei Li; Ruixiang Feng; Ran Le; Chen Yang; Zhenwei An; Yang Song; Tao Zhang; Houfeng Wang

Measuring and Mitigating Post-hoc Rationalization in Reverse Chain-of-Thought Generation

Guangyue Peng, Zongchao Chen, Wen Luo, Yuntao Wen, Wei Li, Ruixiang Feng, Ran Le, Chen Yang, Zhenwei An, Yang Song, Tao Zhang, Houfeng Wang

TL;DR

Structural Skeleton-guided Reasoning (SSR) is proposed, a two-phase approach that first generates an answer-invariant functional skeleton structure, then uses this skeleton to guide full trace generation and consistently reduces anchoring across all three levels.

Abstract

Reverse Chain-of-Thought Generation (RCG) synthesizes reasoning traces from query-answer pairs, but runs the risk of producing post-hoc rationalizations: when models can see the answer during generation, the answer serves as a cognitive anchor that shapes the entire explanation. We formalize this phenomenon through a three-level measurement hierarchy: lexical, entropic, and probabilistic anchoring, each captures surface artifacts, entropy dynamics, and latent answer dependence, respectively. We analyze semantic suppression, the intuitive mitigation strategy that instructs models to ignore the answer, to find out its counterproduction: while it reduces lexical overlap, it paradoxically increases entropic and probabilistic anchoring. Drawing on Ironic Process Theory from cognitive psychology, we attribute this failure to active monitoring of the forbidden answer, which inadvertently deepens dependence on it. To break this cycle, we propose Structural Skeleton-guided Reasoning (SSR), a two-phase approach that first generates an answer-invariant functional skeleton structure, then uses this skeleton to guide full trace generation. By redirecting the information flow to structural planning rather than answer monitoring, SSR consistently reduces anchoring across all three levels. We further introduce Distilled SSR (SSR-D), which fine-tunes models on teacher-generated SSR traces to ensure reliable structural adherence. Experiments across open-ended reasoning benchmarks demonstrate that SSR-D achieves up to 10% improvement over suppression baselines while preserving out-of-distribution (OOD) generalization.

Measuring and Mitigating Post-hoc Rationalization in Reverse Chain-of-Thought Generation

TL;DR

Abstract

Paper Structure (66 sections, 2 theorems, 20 equations, 4 figures, 10 tables, 1 algorithm)

This paper contains 66 sections, 2 theorems, 20 equations, 4 figures, 10 tables, 1 algorithm.

Introduction
Anchoring Measurement
Lexical Anchoring ($\mathcal{A}_{\textup{lex}}$)
Entropic Anchoring ($\mathcal{A}_{\textup{ent}}$)
Probabilistic Anchoring ($\mathcal{A}_{\textup{prob}}$)
Methodology
Baselines
Neutral Prompting (NEU).
Semantic Suppression (SUP).
Structural Skeleton-guided Reasoning (SSR)
Experiments
Data Construction
Behavioral Zones Construction
Observations
Suppression masks lexical anchoring but amplifies internal anchoring.
...and 51 more sections

Key Result

Proposition 3.2

For an $n$-step skeleton with discrete functional tags $f_i \in \mathcal{F}$, if all content summaries are $\epsilon$-functionally invariant, the mutual information between the pre-committed response $A$ and the skeleton $S$ is bounded by:

Figures (4)

Figure 1: Analysis of the relationship between Lexical Anchoring and Downstream Accuracy (from \ref{['tab:main_results']}). The plot reveals that lexical anchoring is a poor indicator of model utility, as semantic suppression methods (SUP/AUG-SUP) tend to "deceive" this metric, obtaining smaller anchoring while simultaneously suffering from a drop in actual downstream performance. In contrast, our SSR method achieves improvements in both dimensions.
Figure 2: Paradigms in Reverse Chain-of-Thought Generation. The curved red and blue arrows indicate the anchoring effect of the pre-committed answer. (a) Post-hoc Rationalization: Visible answers cause shortcutting, resulting in rationalized reasoning chains. (b) Suppression Failure: Negative constraints trigger "ironic monitoring" of the forbidden answer, paradoxically maintaining high anchoring effect and highly rationalized chains. (c) Structural Skeleton-guided Reasoning (SSR): Decoupling content via an abstract skeleton redirects the anchoring, producing unanchored chains driven by structural information rather than answer dependency.
Figure 3: Behavioral zone construction via controlled reference conditions. Real CoT: standard generation without pre-committed response; +Prob Anchor: append response following standard CoT to induce predictive anchoring; +Entropy Anchor: use function-word skeleton of response to constrain exploration; Response as CoT: response used directly as reasoning trace. These conditions empirically locate four zones: Reason (bottom-left), Encode (top-left), Cloze (bottom-right), and Copy (top-right). Point color indicates lexical anchoring (blue: low, red: high).
Figure 4: Mechanism diagnosis across generation strategies. Suppression (SUP, AUG-SUP) shifts traces rightward and upward from NEU, migrating from Reason toward Cloze and Encode zones. SSR concentrates traces in Reason with minimal pathological spread. Point color indicates lexical anchoring (blue: low, red: high).

Theorems & Definitions (10)

Definition 2.1
Definition 2.2
Definition 2.3
Definition 2.4
Definition 2.5
Definition 3.1: $\epsilon$-Functional Invariance
Proposition 3.2: Skeleton Capacity Bound
proof
Definition 3.3: Monitoring Load
Proposition 3.4: SSR Monitoring Bypass

Measuring and Mitigating Post-hoc Rationalization in Reverse Chain-of-Thought Generation

TL;DR

Abstract

Measuring and Mitigating Post-hoc Rationalization in Reverse Chain-of-Thought Generation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (10)