Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties

Gouki Minegishi; Hiroki Furuta; Takeshi Kojima; Yusuke Iwasawa; Yutaka Matsuo

Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties

Gouki Minegishi, Hiroki Furuta, Takeshi Kojima, Yusuke Iwasawa, Yutaka Matsuo

TL;DR

The paper introduces reasoning graphs extracted from hidden states of large reasoning models, and analyzes three graph-theoretic properties—cyclicity, graph diameter, and small-world index—to understand reasoning mechanisms. By clustering segment representations and tracing sequential node visits, the authors compare base and large reasoning models across GSM8K, MATH500, and AIME 2024, finding that larger models exhibit about $5$ cycles per sample, much larger diameters, and pronounced small-world characteristics (roughly $ imes 6$) that correlate with accuracy. The study further shows that supervised fine-tuning on improved datasets expands reasoning graph diameters and enhances performance, offering concrete data-construction guidelines to boost reasoning. Collectively, these results link internal graph-structural properties to empirical reasoning gains, informing interpretability and training-data design for advanced LLMs.

Abstract

Recent large-scale reasoning models have achieved state-of-the-art performance on challenging mathematical benchmarks, yet the internal mechanisms underlying their success remain poorly understood. In this work, we introduce the notion of a reasoning graph, extracted by clustering hidden-state representations at each reasoning step, and systematically analyze three key graph-theoretic properties: cyclicity, diameter, and small-world index, across multiple tasks (GSM8K, MATH500, AIME 2024). Our findings reveal that distilled reasoning models (e.g., DeepSeek-R1-Distill-Qwen-32B) exhibit significantly more recurrent cycles (about 5 per sample), substantially larger graph diameters, and pronounced small-world characteristics (about 6x) compared to their base counterparts. Notably, these structural advantages grow with task difficulty and model capacity, with cycle detection peaking at the 14B scale and exploration diameter maximized in the 32B variant, correlating positively with accuracy. Furthermore, we show that supervised fine-tuning on an improved dataset systematically expands reasoning graph diameters in tandem with performance gains, offering concrete guidelines for dataset design aimed at boosting reasoning capabilities. By bridging theoretical insights into reasoning graph structures with practical recommendations for data construction, our work advances both the interpretability and the efficacy of large reasoning models.

Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties

TL;DR

Abstract

Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (16)