Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models
Siddarth Venkatraman, Vineet Jain, Sarthak Mittal, Vedant Shah, Johan Obando-Ceron, Yoshua Bengio, Brian R. Bartoldson, Bhavya Kailkhura, Guillaume Lajoie, Glen Berseth, Nikolay Malkin, Moksh Jain
TL;DR
This paper introduces Recursive Self-Aggregation (RSA), a hybrid test-time scaling framework that treats reasoning as an evolutionary process by maintaining a population of candidate solutions and recursively aggregating subsets to produce improved reasoning chains. RSA blends parallel exploration with sequential refinement, enabling deeper thinking without external verifiers and leveraging reasoning traces rather than solely final answers. The authors show RSA delivers substantial Pass@1 gains across math, code, reasoning, and knowledge tasks and across diverse model families, bridging the gap between small instruction-tuned models and larger reasoning models. They further demonstrate aggregation-aware reinforcement learning to train models to aggregate solutions, yielding additional performance gains. The work provides practical guidance for deploying deeper test-time thinking under compute budgets and suggests promising directions for future improvements such as explicit fitness functions and multi-step aggregation policies.
Abstract
Test-time scaling methods improve the capabilities of large language models (LLMs) by increasing the amount of compute used during inference to make a prediction. Inference-time compute can be scaled in parallel by choosing among multiple independent solutions or sequentially through self-refinement. We propose Recursive Self-Aggregation (RSA), a test-time scaling method inspired by evolutionary methods that combines the benefits of both parallel and sequential scaling. Each step of RSA refines a population of candidate reasoning chains through aggregation of subsets to yield a population of improved solutions, which are then used as the candidate pool for the next iteration. RSA exploits the rich information embedded in the reasoning chains -- not just the final answers -- and enables bootstrapping from partially correct intermediate steps within different chains of thought. Empirically, RSA delivers substantial performance gains with increasing compute budgets across diverse tasks, model families and sizes. Notably, RSA enables Qwen3-4B-Instruct-2507 to achieve competitive performance with larger reasoning models, including DeepSeek-R1 and o3-mini (high), while outperforming purely parallel and sequential scaling strategies across AIME-25, HMMT-25, Reasoning Gym, LiveCodeBench-v6, and SuperGPQA. We further demonstrate that training the model to combine solutions via a novel aggregation-aware reinforcement learning approach yields significant performance gains. Code available at https://github.com/HyperPotatoNeo/RSA.
