What Makes a Good Reasoning Chain? Uncovering Structural Patterns in Long Chain-of-Thought Reasoning
Gangwei Jiang, Yahui Liu, Zhaoyi Li, Qi Wang, Fuzheng Zhang, Linqi Song, Ying Wei, Defu Lian
TL;DR
This work addresses the question of what structural features in long chain-of-thought reasoning predict correct outcomes in large language models. It introduces LCoT2Tree, an automated pipeline that converts sequential LCoTs into hierarchical trees and analyzes them with graph neural networks to extract patterns such as exploration, backtracking, and verification. The study demonstrates that these structural patterns are stronger predictors of reasoning success than simple length-based metrics, provides explainability through identifying influential subgraphs, and reveals task- and model-specific differences in reasoning behavior. Practically, integrating the tree-based reasoning signals into Best-of-N decoding improves selection quality across diverse tasks and models. Overall, the work positions internal reasoning structure as a critical diagnostic and optimization target for improving LLM reasoning performance.
Abstract
Recent advances in reasoning with large language models (LLMs) have popularized Long Chain-of-Thought (LCoT), a strategy that encourages deliberate and step-by-step reasoning before producing a final answer. While LCoTs have enabled expert-level performance in complex tasks, how the internal structures of their reasoning chains drive, or even predict, the correctness of final answers remains a critical yet underexplored question. In this work, we present LCoT2Tree, an automated framework that converts sequential LCoTs into hierarchical tree structures and thus enables deeper structural analysis of LLM reasoning. Using graph neural networks (GNNs), we reveal that structural patterns extracted by LCoT2Tree, including exploration, backtracking, and verification, serve as stronger predictors of final performance across a wide range of tasks and models. Leveraging an explainability technique, we further identify critical thought patterns such as over-branching that account for failures. Beyond diagnostic insights, the structural patterns by LCoT2Tree support practical applications, including improving Best-of-N decoding effectiveness. Overall, our results underscore the critical role of internal structures of reasoning chains, positioning LCoT2Tree as a powerful tool for diagnosing, interpreting, and improving reasoning in LLMs.
