Table of Contents
Fetching ...

REL: Working out is all you need

Toby Simonds, Jey Han Lau, Chaithanya Bandi

TL;DR

The paper addresses the gap between current LLM reasoning and the human-like exploratory problem-solving seen in O1 by proposing a data-driven pathway built around worked solutions. It introduces ReasonSet, a high-quality dataset of problem-solving traces, and the Reasoning Enhancement Loop (REL), a critic-generator pipeline that iteratively creates and validates worked solutions to improve planning and reasoning in models. Empirical results show that training on worked solutions yields substantial gains (e.g., 18.89% on AIME 2024 with GPT-4o mini, versus 6.66% baseline), and REL can further boost performance (up to 27.78% on AIME 2024) though still short of O1's 44.6%. The work also demonstrates that data quality and structured demonstrations can outperform sheer data quantity, and it releases ReasonSet and O1-Llama 3.2 3B to facilitate broader, open access to advanced reasoning capabilities.

Abstract

Recent developments, particularly OpenAI's O1 model, have demonstrated the remarkable potential of Large Language Models (LLMs) for complex reasoning tasks. Through analysis of O1's outputs and provided sample Chain-of-Thought (CoT) demonstrations, we observe that it approaches problem-solving in a distinctly human-like manner, systematically brainstorming ideas, testing hypotheses, verifying results, and planning comprehensive solutions. These sophisticated reasoning capabilities remain notably absent in other state-of-the-art language models. In this paper, we hypothesize that this performance gap stems from the limited availability of high-quality reasoning process data in current training sets. We demonstrate that by constructing a specialized dataset focused on explicit problem-solving workflows ("worked solutions"), we can elicit substantially improved planning capabilities from existing models. Additionally, we propose the Reasoning Enhancement Loop (REL), a method for generating synthetic worked solutions.

REL: Working out is all you need

TL;DR

The paper addresses the gap between current LLM reasoning and the human-like exploratory problem-solving seen in O1 by proposing a data-driven pathway built around worked solutions. It introduces ReasonSet, a high-quality dataset of problem-solving traces, and the Reasoning Enhancement Loop (REL), a critic-generator pipeline that iteratively creates and validates worked solutions to improve planning and reasoning in models. Empirical results show that training on worked solutions yields substantial gains (e.g., 18.89% on AIME 2024 with GPT-4o mini, versus 6.66% baseline), and REL can further boost performance (up to 27.78% on AIME 2024) though still short of O1's 44.6%. The work also demonstrates that data quality and structured demonstrations can outperform sheer data quantity, and it releases ReasonSet and O1-Llama 3.2 3B to facilitate broader, open access to advanced reasoning capabilities.

Abstract

Recent developments, particularly OpenAI's O1 model, have demonstrated the remarkable potential of Large Language Models (LLMs) for complex reasoning tasks. Through analysis of O1's outputs and provided sample Chain-of-Thought (CoT) demonstrations, we observe that it approaches problem-solving in a distinctly human-like manner, systematically brainstorming ideas, testing hypotheses, verifying results, and planning comprehensive solutions. These sophisticated reasoning capabilities remain notably absent in other state-of-the-art language models. In this paper, we hypothesize that this performance gap stems from the limited availability of high-quality reasoning process data in current training sets. We demonstrate that by constructing a specialized dataset focused on explicit problem-solving workflows ("worked solutions"), we can elicit substantially improved planning capabilities from existing models. Additionally, we propose the Reasoning Enhancement Loop (REL), a method for generating synthetic worked solutions.

Paper Structure

This paper contains 15 sections, 6 figures.

Figures (6)

  • Figure 1: The Reasoning Enhancement Loop (REL) process. The system begins with initial training on human-generated solutions, then enters an iterative loop where the solution generator creates solutions that are verified and corrected through a hint-based process. Successful solutions are added to the training dataset for model refinement.
  • Figure 2: Performance comparison between models trained on human-generated worked solutions versus standard AIME solutions. Results show superior scaling of human-annotated solutions compared to traditional AIME solution sets.
  • Figure 3: Left: Performance improvement across REL iterations. Right: Final performance comparison between base GPT-4o, our REL fine-tuned GPT-4o, and O1 on AIME 2024.
  • Figure 4: Comparison of solution approaches between FT GPT-4o and O1 on a complex polynomial problem.
  • Figure 5: Performance comparison between REL FT GPT-4o and standard GPT-4o outputs. Results demonstrate superior scaling of REL-generated solutions compared to traditional synthetic data generation approaches.
  • ...and 1 more figures