Table of Contents
Fetching ...

Automated Algorithmic Discovery for Scientific Computing through LLM-Guided Evolutionary Search: A Case Study in Gravitational-Wave Detection

He Wang, Liang Zeng

TL;DR

The paper presents Evo-MCTS, a domain-agnostic framework that combines reflective code synthesis from large language models with tree-structured evolutionary search to automatically discover interpretable scientific algorithms. Applied to gravitational-wave detection, Evo-MCTS achieves substantial performance gains over both domain-specific baselines and prior LLM-based optimization strategies, while producing transparent algorithmic pathways that can be analyzed post-hoc. The work demonstrates robust generalization, reproducibility across independent runs, and meaningful improvements through domain-knowledge integration, multi-scale evolutionary operations, and reflective reasoning. The framework's design offers a principled pathway to automated algorithm discovery in physics, chemistry, biology, and engineering, where interpretability and physical validity are essential alongside performance.

Abstract

Automated algorithm discovery in scientific computing faces fundamental challenges: vast design spaces with expensive evaluations, domain-specific physical constraints requiring expert knowledge, and the necessity for interpretable solutions that scientists can validate and understand. We present the Evo-MCTS (Evolutionary Monte Carlo Tree Search) framework, integrating large language models (LLMs) with tree-structured evolutionary search for interpretable algorithm discovery. Evo-MCTS combines reflective code synthesis leveraging LLM domain knowledge, multi-scale evolutionary operations on structured code representations, and interpretable algorithmic pathways emerging from tree-guided exploration. When applied to gravitational wave detection-a challenging domain with continuous parameter spaces and strict physical constraints-Evo-MCTS achieves 20.2% improvement over domain-specific methods and 59.1% over LLM-based optimization frameworks. This improvement arises from its ability to consistently converge toward interpretable algorithmic structures that integrate multiple functional components. Our domain-agnostic architecture establishes a generalizable methodology for automated algorithm discovery in scientific computing, where algorithmic transparency and physical validity are as essential as performance optimization.

Automated Algorithmic Discovery for Scientific Computing through LLM-Guided Evolutionary Search: A Case Study in Gravitational-Wave Detection

TL;DR

The paper presents Evo-MCTS, a domain-agnostic framework that combines reflective code synthesis from large language models with tree-structured evolutionary search to automatically discover interpretable scientific algorithms. Applied to gravitational-wave detection, Evo-MCTS achieves substantial performance gains over both domain-specific baselines and prior LLM-based optimization strategies, while producing transparent algorithmic pathways that can be analyzed post-hoc. The work demonstrates robust generalization, reproducibility across independent runs, and meaningful improvements through domain-knowledge integration, multi-scale evolutionary operations, and reflective reasoning. The framework's design offers a principled pathway to automated algorithm discovery in physics, chemistry, biology, and engineering, where interpretability and physical validity are essential alongside performance.

Abstract

Automated algorithm discovery in scientific computing faces fundamental challenges: vast design spaces with expensive evaluations, domain-specific physical constraints requiring expert knowledge, and the necessity for interpretable solutions that scientists can validate and understand. We present the Evo-MCTS (Evolutionary Monte Carlo Tree Search) framework, integrating large language models (LLMs) with tree-structured evolutionary search for interpretable algorithm discovery. Evo-MCTS combines reflective code synthesis leveraging LLM domain knowledge, multi-scale evolutionary operations on structured code representations, and interpretable algorithmic pathways emerging from tree-guided exploration. When applied to gravitational wave detection-a challenging domain with continuous parameter spaces and strict physical constraints-Evo-MCTS achieves 20.2% improvement over domain-specific methods and 59.1% over LLM-based optimization frameworks. This improvement arises from its ability to consistently converge toward interpretable algorithmic structures that integrate multiple functional components. Our domain-agnostic architecture establishes a generalizable methodology for automated algorithm discovery in scientific computing, where algorithmic transparency and physical validity are as essential as performance optimization.

Paper Structure

This paper contains 31 sections, 12 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: LLM-Guided Evolutionary Monte Carlo Tree Search Framework for Automated Algorithm Discovery.(a) Overview of the algorithm discovery pipeline. Starting from raw gravitational wave strain data (left), the framework applies automated algorithmic transformations through LLM-generated code synthesis (center) to produce optimized detection statistics (right). (b) Core architectural components showing the integration of MCTS exploration with evolutionary optimization through dual perspectives of tree search and population evolution. (b.1) UCT-based node selection from initial algorithmic variants including seed algorithms and individual variants, each represented as nodes containing baseline signal processing code. (b.2) MCTS expansion phase where new algorithmic variants are generated through evolutionary operations. Each node contains executable Python code implementing specific detection strategies. (b.3) Algorithm evaluation phase where generated variants are tested against benchmark data to compute fitness scores, determining performance-based selection for subsequent iterations. (b.4) MCTS backpropagation and elite node updates after multiple evolutionary cycles, propagating performance feedback through the tree structure and maintaining diverse high-performing detection strategies. (c) Detailed view of the reflection mechanism during MCTS expansion, showing four evolutionary operations: Parent Crossover, Sibling Crossover, Path-wise Crossover, and Point Mutation.
  • Figure 2: LLM-Driven Algorithmic Evolution Through Reflective Code Synthesis. Demonstration of a single Parent Crossover evolutionary step showing the transformation from two parent algorithms to an enhanced offspring algorithm. (Top row) Code segments from two parent nodes highlighting complementary algorithmic components that will be combined through the crossover operation. (Bottom left, black box) Reflective analysis process showing how the LLM identifies strengths and limitations in the parent algorithms, synthesizing insights about their respective detection strategies and potential synergies. (Bottom right) Generated offspring algorithm code incorporating successful elements from both parents while addressing identified limitations through domain-aware synthesis. This example illustrates the framework's capability to generate physically-motivated algorithmic improvements through automated reasoning, demonstrating how LLM-guided reflection enables discovery of sophisticated signal processing techniques by combining and enhancing existing algorithmic components without manual intervention. The complete reflection prompts and additional evolution examples are provided in Supplementary Information Section S1.
  • Figure 3: Framework Optimization Dynamics and Performance Validation on MLGWSC-1 Benchmark.(a) Evo-MCTS framework adaptation pipeline demonstrating integration with domain-specific evaluation protocols through standardized fitness assessment. The pipeline processes input data through evolved algorithms to produce outputs evaluated against ground truth labels using area under the curve (AUC) metrics. (b) Framework optimization trajectory and diversity analysis across 877 evaluations from 5 independent runs. Combined fitness trajectory (blue dots) with best objective envelope (red line) showing four phase transitions (PT 1-4, orange stars) marking algorithmic breakthroughs with fitness gains $\ge 400$ units. Maximum fitness of 5,241 units achieved, representing a 6-fold improvement from baseline. Diversity metrics include Shannon diversity index (blue, left axis) and Complexity Index of Diversity (CID, red, right axis) with error bars showing standard deviation across runs. Right panel shows fitness-stratified diversity analysis revealing systematic exploration patterns across performance levels. (c) Performance comparison on MLGWSC-1, Set 4 dataset showing framework validation against seven benchmark algorithms (Sage, Virgo-AUTh, PyCBC, TPI FSU Jena, cWB, MFCNN, CNN-Coinc). Optimization milestones PT-1 through PT-4 demonstrate progressive algorithmic refinement, with PT-4 achieving 20.2% improvement over SOTA baselines. Grey curves represent intermediate solutions explored during optimization, while the red dotted line shows seed function baseline. Vertical dashed lines indicate evaluation range boundaries (4-1000 events per month). Results validate the framework's systematic exploration capabilities, interpretable algorithmic pathways, and effective convergence toward high-performing solutions through progressive complexity enhancement and multi-component integration strategies.
  • Figure 4: Generalization Validation and Component Effectiveness Analysis.(a) Training versus test performance correlation for 877 algorithmic configurations evaluated under 0.2-second trigger arrival time uncertainty constraint. Each point represents an individual algorithm's fitness scores (computed from AUC metrics) on training (7-day dataset) and test (1-day independent dataset) data. Linear correlation coefficient r = 0.840 indicates strong generalization capability, while variance reflects expected performance variation due to non-stationary, non-Gaussian noise characteristics. Red dashed line shows the empirical trend relationship, while the grey dashed line represents perfect correlation (y=x). High-performing algorithms (fitness $>$ 4000) demonstrate particularly robust generalization across different noise realizations and signal parameters. (b) MCTS depth-stratified performance analysis across optimization phases. Fitness distribution of algorithms organized by MCTS tree depth groups (Depth I: depths 1-4, Depth II: depths 5-7, Depth III: depths 8-10) and phase transitions (PT1-PT4). Training performance (teal) and test performance (pink) are shown with violin plots for sample sizes $n \ge 10$ and scatter plots (circles/rectangles) for $n < 10$. The analysis reveals systematic migration of high-fitness algorithms toward deeper tree layers as optimization progresses, with elite algorithms (fitness $>$ 5,000) emerging exclusively in deeper layers during PT4. Enhanced generalization capability is observed in deeper layers during later optimization phases, as evidenced by improved training-test performance alignment in Depth III compared to shallower depth groups. (c) Algorithmic component impact analysis. Violin plots comparing normalized fitness distributions between algorithms with specific techniques (left) versus without (right). Techniques categorized as conditioning methods (teal), time-frequency analysis (orange), and trigger detection (green). Technique effectiveness is determined by distributional separation: wider gaps between left and right distributions indicate stronger performance impact. Conditioning techniques (Savitzky-Golay filtering, Adaptive Gain Regularization) and trigger detection methods (Curvature Analysis, Continuous Wavelet Transform Validation) demonstrate the most substantial improvements through clear distributional shifts toward higher fitness values. Statistical validation across 1,000 resampling iterations confirms significance ($p < 0.001$) and practical importance.
  • Figure 5: Algorithmic Evolutionary Pathway of MCTS and Edge Robustness Analysis.(a) Complete MCTS tree structure showing all nodes associated with the PT4 algorithm (node 486, fitness=5241.4) discovered in an optimization run. Node sizes encode fitness values (larger circles = higher performance), with evaluation times displayed inside circles. Node colors indicate expansion operation types: Parent Crossover (orange), Sibling Crossover (cyan), Path-wise Crossover (green), and Point Mutation (purple). Solid black lines represent the selected MCTS exploration path, while dashed gray lines indicate nodes referenced in expansion prompts for knowledge synthesis. Five key algorithmic breakthroughs are annotated: Multi-resolution Thresholding (first appearing at node 12), CWT using Ricker Wavelet (node 28), Tikhonov Regularization (node 140), Curvature Boosting (node 151), and Savitzky-Golay Filter (node 333). These techniques propagate through subsequent generations, demonstrating systematic knowledge accumulation and refinement. The tree visualization reveals how sophisticated detection algorithms emerge through progressive technique integration across multiple MCTS depth levels. (b) Edge robustness analysis for three critical evolutionary transitions. Each subplot shows fitness distributions from 100 independent re-executions of specific edges: Edge 47→69 (early breakthrough, mean fitness 1034.6, 89.25% variants exceeding preceding node performance), Edge 140→151 (intermediate advancement, mean fitness 1646.8, 52.81% achieving superior fitness with 100% regularization technique inheritance), and Edge 485→486 (final optimization stage, mean fitness 3274.6, 70.65% variants outperforming node 204, 25.00% surpassing node 485). Vertical reference lines indicate the original node fitness values and key ancestral nodes. The distributions demonstrate the stochastic nature of LLM-driven code generation while confirming consistent discovery of high-performance algorithmic variants with robust knowledge transfer across independent executions.
  • ...and 6 more figures