Table of Contents
Fetching ...

RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems

Yuxiao Qu, Anikait Singh, Yoonho Lee, Amrith Setlur, Ruslan Salakhutdinov, Chelsea Finn, Aviral Kumar

TL;DR

RLAD addresses the limitations of long chain-of-thought reasoning by introducing reasoning abstractions—concise, natural-language priors that capture procedural and factual knowledge. It jointly trains an abstraction generator and an abstraction-conditioned solution generator via a cooperative RL paradigm, with a carefully designed reward to ensure abstractions aid rather than leak answers. Empirically, RLAD yields consistent improvements across math reasoning benchmarks (e.g., AIME 2025, AMC 2023, DeepScaleR Hard) and ARC-AGI, with larger gains when using multiple abstractions and when allocating compute to abstraction generation. The work highlights abstractions as a distinct, orthogonal axis for scaling test-time performance and improving generalization, offering a path toward broader and more robust reasoning in LLMs.

Abstract

Reasoning requires going beyond pattern matching or memorization of solutions to identify and implement "algorithmic procedures" that can be used to deduce answers to hard problems. Doing so requires realizing the most relevant primitives, intermediate results, or shared procedures, and building upon them. While RL post-training on long chains of thought ultimately aims to uncover this kind of algorithmic behavior, most reasoning traces learned by large models fail to consistently capture or reuse procedures, instead drifting into verbose and degenerate exploration. To address more effective reasoning, we introduce reasoning abstractions: concise natural language descriptions of procedural and factual knowledge that guide the model toward learning successful reasoning. We train models to be capable of proposing multiple abstractions given a problem, followed by RL that incentivizes building a solution while using the information provided by these abstractions. This results in a two-player RL training paradigm, abbreviated as RLAD, that jointly trains an abstraction generator and a solution generator. This setup effectively enables structured exploration, decouples learning signals of abstraction proposal and solution generation, and improves generalization to harder problems. We also show that allocating more test-time compute to generating abstractions is more beneficial for performance than generating more solutions at large test budgets, illustrating the role of abstractions in guiding meaningful exploration.

RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems

TL;DR

RLAD addresses the limitations of long chain-of-thought reasoning by introducing reasoning abstractions—concise, natural-language priors that capture procedural and factual knowledge. It jointly trains an abstraction generator and an abstraction-conditioned solution generator via a cooperative RL paradigm, with a carefully designed reward to ensure abstractions aid rather than leak answers. Empirically, RLAD yields consistent improvements across math reasoning benchmarks (e.g., AIME 2025, AMC 2023, DeepScaleR Hard) and ARC-AGI, with larger gains when using multiple abstractions and when allocating compute to abstraction generation. The work highlights abstractions as a distinct, orthogonal axis for scaling test-time performance and improving generalization, offering a path toward broader and more robust reasoning in LLMs.

Abstract

Reasoning requires going beyond pattern matching or memorization of solutions to identify and implement "algorithmic procedures" that can be used to deduce answers to hard problems. Doing so requires realizing the most relevant primitives, intermediate results, or shared procedures, and building upon them. While RL post-training on long chains of thought ultimately aims to uncover this kind of algorithmic behavior, most reasoning traces learned by large models fail to consistently capture or reuse procedures, instead drifting into verbose and degenerate exploration. To address more effective reasoning, we introduce reasoning abstractions: concise natural language descriptions of procedural and factual knowledge that guide the model toward learning successful reasoning. We train models to be capable of proposing multiple abstractions given a problem, followed by RL that incentivizes building a solution while using the information provided by these abstractions. This results in a two-player RL training paradigm, abbreviated as RLAD, that jointly trains an abstraction generator and a solution generator. This setup effectively enables structured exploration, decouples learning signals of abstraction proposal and solution generation, and improves generalization to harder problems. We also show that allocating more test-time compute to generating abstractions is more beneficial for performance than generating more solutions at large test budgets, illustrating the role of abstractions in guiding meaningful exploration.

Paper Structure

This paper contains 26 sections, 3 equations, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: Reasoning abstractions illustrated in the solution-space graph for a problem. We depict the solution space as a graph of intermediate steps leading to correct or incorrect answers. (1) Standard reasoning explores this space along one sequential chain. (2) We generate textual abstractions by summarizing which intermediate steps led to which outcomes. (3) Such abstractions can be reused to guide reasoning more efficiently.
  • Figure 2: Benefits from abstractions rely crucially on strength of the solver, abstraction length, and solution model. Most configurations fail to yield gains; only o4-mini with long and detailed abstractions shows consistent improvements across the GSM8k and GSMPlus datasets (left). The capability of the problem solver conditioned on abstractions also matters: even strong abstractions help only if the solution model is sufficiently capable (middle, right).
  • Figure 3: RLAD training paradigm. We train an abstraction generator, $\pi^\mathrm{abs}_\theta$, that proposes some reasoning abstractions conditioned on the question $\mathbf{x}$, denoted as $\mathbf{z}$. Then, the solution generator, $\pi^\mathrm{sol}_\theta$, is trained to produce a response, $\tilde{\mathbf{y}}$, conditioned on the generated abstraction $\mathbf{z}$. The reward used for training $\pi^\mathrm{abs}_\theta$ corresponds to the average success rate of the solution generator conditioned on the proposed abstraction.
  • Figure 4: A typical example of a reasoning abstraction proposed by our abstraction generator. In the solution, we see (in blue) references to the abstraction ("cheatsheet") and keywords from the abstraction being used meaningfully in the reasoning trace of the solution generator model.
  • Figure 5: Tradeoff of abstraction and solution generation on AIME 2025. As the total inference compute budget increases (color scheme on the right), we find better performance efficiency when allocating our budget to abstraction generation rather than solution generation, for all values of normalization offset $k_0$ given to us.
  • ...and 3 more figures