AlphaResearch: Accelerating New Algorithm Discovery with Language Models
Zhaojian Yu, Kaiyue Feng, Yilun Zhao, Shilin He, Xiao-Ping Zhang, Arman Cohan
TL;DR
AlphaResearch addresses the challenge of autonomous algorithm discovery by coupling idea generation with program-based verification inside a dual environment that also mimics real-world peer-review rewards. The system trains AlphaResearch-RM-7B on ICLR peer-review data to score new ideas and uses an automatic code-executor to generate and evaluate programs, refining proposals iteratively. Empirically, AlphaResearch achieves a $2/8$ win rate against human researchers and surpasses human and AlphaEvolve baselines on the Packing Circles problem with a best-known score of $2.939$, while six other problems reveal remaining gaps in discovery capability. The paper introduces AlphaResearchComp, a curated benchmark of eight open-ended tasks, and analyzes the benefits and limitations of dual-reward autonomous search for open-ended scientific knowledge expansion, highlighting potential for future scaling and tool integration.
Abstract
Large language models have made significant progress in complex but easy-to-verify problems, yet they still struggle with discovering the unknown. In this paper, we present \textbf{AlphaResearch}, an autonomous research agent designed to discover new algorithms on open-ended problems. To synergize the feasibility and innovation of the discovery process, we construct a novel dual research environment by combining the execution-based verify and simulated real-world peer review environment. AlphaResearch discovers new algorithm by iteratively running the following steps: (1) propose new ideas (2) verify the ideas in the dual research environment (3) optimize the research proposals for better performance. To promote a transparent evaluation process, we construct \textbf{AlphaResearchComp}, a new evaluation benchmark that includes an eight open-ended algorithmic problems competition, with each problem carefully curated and verified through executable pipelines, objective metrics, and reproducibility checks. AlphaResearch gets a 2/8 win rate in head-to-head comparison with human researchers, demonstrate the possibility of accelerating algorithm discovery with LLMs. Notably, the algorithm discovered by AlphaResearch on the \emph{``packing circles''} problem achieves the best-of-known performance, surpassing the results of human researchers and strong baselines from recent work (e.g., AlphaEvolve). Additionally, we conduct a comprehensive analysis of the remaining challenges of the 6/8 failure cases, providing valuable insights for future research.
