Test-time Scaling of LLMs: A Survey from A Subproblem Structure Perspective
Zhuoyi Yang, Xu Guo, Tong Zhang, Huijuan Xu, Boyang Li
TL;DR
The paper tackles how to improve LLM/VLM inference accuracy by spending more compute at test time without altering model weights. It introduces a unifying perspective based on subproblem structure (sequential, parallel, tree) to categorize TTS methods, including CoT, ToT, and related hybrids. The authors survey task decomposition strategies (human-only and LLM-assisted) and analyze reasoning-path approaches across unimodal, multimodal, and RAG settings, detailing strengths, weaknesses, and mitigation techniques. They conclude with promising directions—meta-reasoning, efficient multi-path exploration, and extending tree-based reasoning to multimodal/RAG systems—that could yield more efficient, robust, and generalizable reasoning systems.
Abstract
With this paper, we survey techniques for improving the predictive accuracy of pretrained large language models by allocating additional compute at inference time. In categorizing test-time scaling methods, we place special emphasis on how a problem is decomposed into subproblems and on the topological organization of these subproblems whether sequential, parallel, or tree-structured. This perspective allows us to unify diverse approaches such as Chain-of-Thought, Branch-Solve-Merge, and Tree-of-Thought under a common lens. We further synthesize existing analyses of these techniques, highlighting their respective strengths and weaknesses, and conclude by outlining promising directions for future research
