Table of Contents
Fetching ...

ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution

Robert Tjarko Lange, Yuki Imajuku, Edoardo Cetin

TL;DR

ShinkaEvolve tackles the sample-inefficiency and reproducibility challenges of LLM-driven open-ended discovery by introducing three synergistic innovations: adaptive parent/inspiration sampling, code novelty rejection sampling, and a bandit-based LLM ensemble. The framework maintains an archive of evaluated programs, utilizes world feedback, and employs online meta-learning to guide mutations. Empirical results across circle packing, AIME, ALE-Bench, and MoE load-balancing losses demonstrate state-of-the-art performance with orders-of-magnitude fewer evaluations and with open-source accessibility under Apache 2.0. Collectively, the work broadens the practical reach of open-ended computational discovery while highlighting avenues for automated task generation and self-guided refinement.

Abstract

We introduce ShinkaEvolve: a new open-source framework leveraging large language models (LLMs) to advance scientific discovery with state-of-the-art performance and unprecedented efficiency. Recent advances in scaling inference time compute of LLMs have enabled significant progress in generalized scientific discovery. These approaches rely on evolutionary agentic harnesses that leverage LLMs as mutation operators to generate candidate solutions. However, current code evolution methods suffer from critical limitations: they are sample inefficient, requiring thousands of samples to identify effective solutions, and remain closed-source, hindering broad adoption and extension. ShinkaEvolve addresses these limitations, introducing three key innovations: a parent sampling technique balancing exploration and exploitation, code novelty rejection-sampling for efficient search space exploration, and a bandit-based LLM ensemble selection strategy. We evaluate ShinkaEvolve across diverse tasks, demonstrating consistent improvements in sample efficiency and solution quality. ShinkaEvolve discovers a new state-of-the-art circle packing solution using only 150 samples, designs high-performing agentic harnesses for AIME mathematical reasoning tasks, identifies improvements to ALE-Bench competitive programming solutions, and discovers novel mixture-of-expert load balancing loss functions that illuminate the space of optimization strategies. Our results demonstrate that ShinkaEvolve achieves broad applicability with exceptional sample efficiency. By providing open-source accessibility and cost-efficiency, this work democratizes open-ended discovery across diverse computational problems.

ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution

TL;DR

ShinkaEvolve tackles the sample-inefficiency and reproducibility challenges of LLM-driven open-ended discovery by introducing three synergistic innovations: adaptive parent/inspiration sampling, code novelty rejection sampling, and a bandit-based LLM ensemble. The framework maintains an archive of evaluated programs, utilizes world feedback, and employs online meta-learning to guide mutations. Empirical results across circle packing, AIME, ALE-Bench, and MoE load-balancing losses demonstrate state-of-the-art performance with orders-of-magnitude fewer evaluations and with open-source accessibility under Apache 2.0. Collectively, the work broadens the practical reach of open-ended computational discovery while highlighting avenues for automated task generation and self-guided refinement.

Abstract

We introduce ShinkaEvolve: a new open-source framework leveraging large language models (LLMs) to advance scientific discovery with state-of-the-art performance and unprecedented efficiency. Recent advances in scaling inference time compute of LLMs have enabled significant progress in generalized scientific discovery. These approaches rely on evolutionary agentic harnesses that leverage LLMs as mutation operators to generate candidate solutions. However, current code evolution methods suffer from critical limitations: they are sample inefficient, requiring thousands of samples to identify effective solutions, and remain closed-source, hindering broad adoption and extension. ShinkaEvolve addresses these limitations, introducing three key innovations: a parent sampling technique balancing exploration and exploitation, code novelty rejection-sampling for efficient search space exploration, and a bandit-based LLM ensemble selection strategy. We evaluate ShinkaEvolve across diverse tasks, demonstrating consistent improvements in sample efficiency and solution quality. ShinkaEvolve discovers a new state-of-the-art circle packing solution using only 150 samples, designs high-performing agentic harnesses for AIME mathematical reasoning tasks, identifies improvements to ALE-Bench competitive programming solutions, and discovers novel mixture-of-expert load balancing loss functions that illuminate the space of optimization strategies. Our results demonstrate that ShinkaEvolve achieves broad applicability with exceptional sample efficiency. By providing open-source accessibility and cost-efficiency, this work democratizes open-ended discovery across diverse computational problems.

Paper Structure

This paper contains 51 sections, 7 equations, 14 figures, 5 tables.

Figures (14)

  • Figure 1: High-level overview of ShinkaEvolve.Left: The ShinkaEvolve framework constructs an archive of evaluated programs, rejection-samples new programs, and evaluates their fitness. Right:ShinkaEvolve provides a sample efficient alternative to AlphaEvolve and outperforms its Circle Packing solution.
  • Figure 2: ShinkaEvolve Parent Sampling. The strategies range from pure exploration (uniform sampling) to pure exploitation (hill-climbing) to a combination of performance and novelty.
  • Figure 3: ShinkaEvolve Program Novelty Rejection Sampling.ShinkaEvolve embeds mutable code snippets, computes similarities across the archive; if the maximal score exceeds a threshold, another LLM is queried to assess whether the program is meaningfully novel.
  • Figure 4: A ShinkaEvolve Meta-Scratchpad. It consists of individual program summaries, global insights, and implementation recommendations, which are appended to the mutation prompt.
  • Figure 5: ShinkaEvolve on Circle Packing Task.Left:ShinkaEvolve outperforms AlphaEvolve's solution within less than 150 program evaluations. Right:ShinkaEvolve's program evolution tree demonstrates the iterative composition of stepping stones into high-performing solutions.
  • ...and 9 more figures