When Reasoning Meets Its Laws
Junyu Zhang, Yifan Sun, Tianang Leng, Jingyan Shen, Liu Ziyin, Paul Pu Liang, Huan Zhang
TL;DR
The paper introduces the Laws of Reasoning (LoRe) framework to formalize how Large Reasoning Models allocate compute and degrade accuracy as task complexity grows. It defines two tractable proxies, monotonicity and compositionality, and builds LoRe-Bench to evaluate them; it formalizes a compute law and an accuracy law linking compute and accuracy to a complexity measure. Empirically, current LRMs show strong monotonicity but poor compositionality, motivating a compositionality-focused finetuning method (SFT-Compo) that yields improvements across multiple benchmarks and model sizes, with notable synergistic gains across properties. The work provides both a theoretical lens and a practical toolkit for guiding LRMs toward more human-like trade-offs in reasoning, with open-source resources for reproduction and further research.
Abstract
Despite the superior performance of Large Reasoning Models (LRMs), their reasoning behaviors are often counterintuitive, leading to suboptimal reasoning capabilities. To theoretically formalize the desired reasoning behaviors, this paper presents the Laws of Reasoning (LoRe), a unified framework that characterizes intrinsic reasoning patterns in LRMs. We first propose compute law with the hypothesis that the reasoning compute should scale linearly with question complexity. Beyond compute, we extend LoRe with a supplementary accuracy law. Since the question complexity is difficult to quantify in practice, we examine these hypotheses by two properties of the laws, monotonicity and compositionality. We therefore introduce LoRe-Bench, a benchmark that systematically measures these two tractable properties for large reasoning models. Evaluation shows that most reasoning models exhibit reasonable monotonicity but lack compositionality. In response, we develop an effective finetuning approach that enforces compute-law compositionality. Extensive empirical studies demonstrate that better compliance with compute laws yields consistently improved reasoning performance on multiple benchmarks, and uncovers synergistic effects across properties and laws. Project page: https://lore-project.github.io/
