Table of Contents
Fetching ...

Scientific Algorithm Discovery by Augmenting AlphaEvolve with Deep Research

Gang Liu, Yihan Zhu, Jie Chen, Meng Jiang

TL;DR

DeepEvolve marries deep research with evolutionary algorithm discovery to overcome the limitations of purely hypothesis-driven or purely grounded approaches. By coupling six coordinated modules—plan, search, write, code, evaluate, and evolutionary selection—with cross-file edits, debugging, and a memory database, it discovers executable algorithms that outperform initial baselines across nine cross-domain benchmarks. Empirical results show substantial and sometimes dramatic improvements in new-score metrics, while analyses reveal domain priors, uncertainty estimation, and adaptive loss strategies as key drivers of generalizable gains. The framework demonstrates a practical path toward reliable AI-driven scientific discovery, balancing innovation with implementability and grounding within bounded computational budgets.

Abstract

Large language models hold promise as scientific assistants, yet existing agents either rely solely on algorithm evolution or on deep research in isolation, both of which face critical limitations. Pure algorithm evolution, as in AlphaEvolve, depends only on the internal knowledge of LLMs and quickly plateaus in complex domains, while pure deep research proposes ideas without validation, resulting in unrealistic or unimplementable solutions. We present DeepEvolve, an agent that integrates deep research with algorithm evolution, uniting external knowledge retrieval, cross-file code editing, and systematic debugging under a feedback-driven iterative loop. Each iteration not only proposes new hypotheses but also refines, implements, and tests them, avoiding both shallow improvements and unproductive over-refinements. Across nine benchmarks in chemistry, mathematics, biology, materials, and patents, DeepEvolve consistently improves the initial algorithm, producing executable new algorithms with sustained gains. By bridging the gap between unguided evolution and research without grounding, DeepEvolve provides a reliable framework for advancing scientific algorithm discovery. Our code is available at https://github.com/liugangcode/deepevolve.

Scientific Algorithm Discovery by Augmenting AlphaEvolve with Deep Research

TL;DR

DeepEvolve marries deep research with evolutionary algorithm discovery to overcome the limitations of purely hypothesis-driven or purely grounded approaches. By coupling six coordinated modules—plan, search, write, code, evaluate, and evolutionary selection—with cross-file edits, debugging, and a memory database, it discovers executable algorithms that outperform initial baselines across nine cross-domain benchmarks. Empirical results show substantial and sometimes dramatic improvements in new-score metrics, while analyses reveal domain priors, uncertainty estimation, and adaptive loss strategies as key drivers of generalizable gains. The framework demonstrates a practical path toward reliable AI-driven scientific discovery, balancing innovation with implementability and grounding within bounded computational budgets.

Abstract

Large language models hold promise as scientific assistants, yet existing agents either rely solely on algorithm evolution or on deep research in isolation, both of which face critical limitations. Pure algorithm evolution, as in AlphaEvolve, depends only on the internal knowledge of LLMs and quickly plateaus in complex domains, while pure deep research proposes ideas without validation, resulting in unrealistic or unimplementable solutions. We present DeepEvolve, an agent that integrates deep research with algorithm evolution, uniting external knowledge retrieval, cross-file code editing, and systematic debugging under a feedback-driven iterative loop. Each iteration not only proposes new hypotheses but also refines, implements, and tests them, avoiding both shallow improvements and unproductive over-refinements. Across nine benchmarks in chemistry, mathematics, biology, materials, and patents, DeepEvolve consistently improves the initial algorithm, producing executable new algorithms with sustained gains. By bridging the gap between unguided evolution and research without grounding, DeepEvolve provides a reliable framework for advancing scientific algorithm discovery. Our code is available at https://github.com/liugangcode/deepevolve.

Paper Structure

This paper contains 61 sections, 1 equation, 17 figures, 4 tables.

Figures (17)

  • Figure 1: The top panel shows AlphaEvolve-style pure algorithm evolution without deep research, where the best improvement appears in the first generation and later iterations have marginal gains. The bottom panel shows DeepEvolve, which integrates deep research. DeepEvolve avoids shallow or excessively deep but unproductive evolutions, achieving sustained progress with clear performance jumps at key iterations. $+$ denotes adding a new idea, and $\circlearrowright$ denotes refining a previous idea.
  • Figure 2: DeepEvolve is structured around six collaborative modules that alternate between deep research and algorithm evolution. Deep research generates informed hypotheses through planning, retrieval, and synthesis, while algorithm evolution translates these hypotheses into code, evaluates them, and applies evolutionary strategies for selection.
  • Figure 3: Evaluation of the idea from initial and new algorithms with LLM-as-a-judge.
  • Figure 4: The new model.forward() for Molecular Prediction. DeepEvolve proposes contrastive learning in Line 29-34, motif-aware masking in Line 8, and additional modules (see \ref{['fig:idea-evo']}) to improve the algorithm. The code of these functions is in \ref{['sec:add-proposed-mol-code']}.
  • Figure 5: Changes of scores over iterations.
  • ...and 12 more figures