Table of Contents
Fetching ...

AlphaResearch: Accelerating New Algorithm Discovery with Language Models

Zhaojian Yu, Kaiyue Feng, Yilun Zhao, Shilin He, Xiao-Ping Zhang, Arman Cohan

TL;DR

AlphaResearch addresses the challenge of autonomous algorithm discovery by coupling idea generation with program-based verification inside a dual environment that also mimics real-world peer-review rewards. The system trains AlphaResearch-RM-7B on ICLR peer-review data to score new ideas and uses an automatic code-executor to generate and evaluate programs, refining proposals iteratively. Empirically, AlphaResearch achieves a $2/8$ win rate against human researchers and surpasses human and AlphaEvolve baselines on the Packing Circles problem with a best-known score of $2.939$, while six other problems reveal remaining gaps in discovery capability. The paper introduces AlphaResearchComp, a curated benchmark of eight open-ended tasks, and analyzes the benefits and limitations of dual-reward autonomous search for open-ended scientific knowledge expansion, highlighting potential for future scaling and tool integration.

Abstract

Large language models have made significant progress in complex but easy-to-verify problems, yet they still struggle with discovering the unknown. In this paper, we present \textbf{AlphaResearch}, an autonomous research agent designed to discover new algorithms on open-ended problems. To synergize the feasibility and innovation of the discovery process, we construct a novel dual research environment by combining the execution-based verify and simulated real-world peer review environment. AlphaResearch discovers new algorithm by iteratively running the following steps: (1) propose new ideas (2) verify the ideas in the dual research environment (3) optimize the research proposals for better performance. To promote a transparent evaluation process, we construct \textbf{AlphaResearchComp}, a new evaluation benchmark that includes an eight open-ended algorithmic problems competition, with each problem carefully curated and verified through executable pipelines, objective metrics, and reproducibility checks. AlphaResearch gets a 2/8 win rate in head-to-head comparison with human researchers, demonstrate the possibility of accelerating algorithm discovery with LLMs. Notably, the algorithm discovered by AlphaResearch on the \emph{``packing circles''} problem achieves the best-of-known performance, surpassing the results of human researchers and strong baselines from recent work (e.g., AlphaEvolve). Additionally, we conduct a comprehensive analysis of the remaining challenges of the 6/8 failure cases, providing valuable insights for future research.

AlphaResearch: Accelerating New Algorithm Discovery with Language Models

TL;DR

AlphaResearch addresses the challenge of autonomous algorithm discovery by coupling idea generation with program-based verification inside a dual environment that also mimics real-world peer-review rewards. The system trains AlphaResearch-RM-7B on ICLR peer-review data to score new ideas and uses an automatic code-executor to generate and evaluate programs, refining proposals iteratively. Empirically, AlphaResearch achieves a win rate against human researchers and surpasses human and AlphaEvolve baselines on the Packing Circles problem with a best-known score of , while six other problems reveal remaining gaps in discovery capability. The paper introduces AlphaResearchComp, a curated benchmark of eight open-ended tasks, and analyzes the benefits and limitations of dual-reward autonomous search for open-ended scientific knowledge expansion, highlighting potential for future scaling and tool integration.

Abstract

Large language models have made significant progress in complex but easy-to-verify problems, yet they still struggle with discovering the unknown. In this paper, we present \textbf{AlphaResearch}, an autonomous research agent designed to discover new algorithms on open-ended problems. To synergize the feasibility and innovation of the discovery process, we construct a novel dual research environment by combining the execution-based verify and simulated real-world peer review environment. AlphaResearch discovers new algorithm by iteratively running the following steps: (1) propose new ideas (2) verify the ideas in the dual research environment (3) optimize the research proposals for better performance. To promote a transparent evaluation process, we construct \textbf{AlphaResearchComp}, a new evaluation benchmark that includes an eight open-ended algorithmic problems competition, with each problem carefully curated and verified through executable pipelines, objective metrics, and reproducibility checks. AlphaResearch gets a 2/8 win rate in head-to-head comparison with human researchers, demonstrate the possibility of accelerating algorithm discovery with LLMs. Notably, the algorithm discovered by AlphaResearch on the \emph{``packing circles''} problem achieves the best-of-known performance, surpassing the results of human researchers and strong baselines from recent work (e.g., AlphaEvolve). Additionally, we conduct a comprehensive analysis of the remaining challenges of the 6/8 failure cases, providing valuable insights for future research.

Paper Structure

This paper contains 56 sections, 15 equations, 8 figures, 4 tables, 1 algorithm.

Figures (8)

  • Figure 1: Comparison of OpenEvolve (with program-based reward), ShinkaEvolve (with program-based reward) and AlphaResearch (with program-based and peer-review reward). We run three agents on Packing Circles (n=26) problems. AlphaResearch achieves better performance than others.
  • Figure 2: The launch of AlphaResearch contains two steps. (1) Train reward models with real-world peer-reviewed records. (2) Prepare initial research proposals, initial programs and evalution program. AlphaResearch will refine the research proposals and programs autonomously.
  • Figure 3: Execution-based reward of AlphaResearch on packing circles (n=26) problem (left) and third autocorrelation inequality problem (right).
  • Figure 4: The idea comparison between execution-only research agent and AlphaResearch where AlphaResearch-RM-7B are used.
  • Figure 5: Reward overview during the discovery process. Each action in AlphaResearch will obtain 3 kinds of reward: (1) idea scrapping due to lower RM score than threshold, (2) idea execution successes, and (3) idea execution fails.
  • ...and 3 more figures