Table of Contents
Fetching ...

LLM-Guided Evolutionary Program Synthesis for Quasi-Monte Carlo Design

Amir Sadikov

TL;DR

This paper tackles two core QMC design problems—constructing finite point sets with minimal star discrepancy and optimizing Sobol' direction numbers—by casting them as program-synthesis tasks solved through an LLM-guided evolutionary loop within the OpenEvolve framework. The approach yields new low-discrepancy 2D/3D point configurations that outperform prior benchmarks and discovers Sobol' parameters that reduce rQMC mean-squared error in 32-dimensional option pricing tests, while remaining extensible to arbitrary $N$ and compatible with standard randomizations. Key contributions include a two-phase discovery strategy (direct construction followed by iterative refinement), rigorous discrepancy evaluation, and robust empirical validation across multiple high-dimensional problems. The results demonstrate that LLM-driven evolution can automate high-quality QMC design, recovering classical constructions when optimal and surpassing them when finite-$N$ structure matters, with open data and code for reproducibility.

Abstract

Low-discrepancy point sets and digital sequences underpin quasi-Monte Carlo (QMC) methods for high-dimensional integration. We cast two long-standing QMC design problems as program synthesis and solve them with an LLM-guided evolutionary loop that mutates and selects code under task-specific fitness: (i) constructing finite 2D/3D point sets with low star discrepancy, and (ii) choosing Sobol' direction numbers that minimize randomized QMC error on downstream integrands. Our two-phase procedure combines constructive code proposals with iterative numerical refinement. On finite sets, we rediscover known optima in small 2D cases and set new best-known 2D benchmarks for N >= 40, while matching most known 3D optima up to the proven frontier (N <= 8) and reporting improved 3D benchmarks beyond. On digital sequences, evolving Sobol' parameters yields consistent reductions in randomized quasi-Monte Carlo (rQMC) mean-squared error for several 32-dimensional option-pricing tasks relative to widely used Joe--Kuo parameters, while preserving extensibility to any sample size and compatibility with standard randomizations. Taken together, the results demonstrate that LLM-driven evolutionary program synthesis can automate the discovery of high-quality QMC constructions, recovering classical designs where they are optimal and improving them where finite-N structure matters. Data and code are available at https://github.com/hockeyguy123/openevolve-star-discrepancy.git.

LLM-Guided Evolutionary Program Synthesis for Quasi-Monte Carlo Design

TL;DR

This paper tackles two core QMC design problems—constructing finite point sets with minimal star discrepancy and optimizing Sobol' direction numbers—by casting them as program-synthesis tasks solved through an LLM-guided evolutionary loop within the OpenEvolve framework. The approach yields new low-discrepancy 2D/3D point configurations that outperform prior benchmarks and discovers Sobol' parameters that reduce rQMC mean-squared error in 32-dimensional option pricing tests, while remaining extensible to arbitrary and compatible with standard randomizations. Key contributions include a two-phase discovery strategy (direct construction followed by iterative refinement), rigorous discrepancy evaluation, and robust empirical validation across multiple high-dimensional problems. The results demonstrate that LLM-driven evolution can automate high-quality QMC design, recovering classical constructions when optimal and surpassing them when finite- structure matters, with open data and code for reproducibility.

Abstract

Low-discrepancy point sets and digital sequences underpin quasi-Monte Carlo (QMC) methods for high-dimensional integration. We cast two long-standing QMC design problems as program synthesis and solve them with an LLM-guided evolutionary loop that mutates and selects code under task-specific fitness: (i) constructing finite 2D/3D point sets with low star discrepancy, and (ii) choosing Sobol' direction numbers that minimize randomized QMC error on downstream integrands. Our two-phase procedure combines constructive code proposals with iterative numerical refinement. On finite sets, we rediscover known optima in small 2D cases and set new best-known 2D benchmarks for N >= 40, while matching most known 3D optima up to the proven frontier (N <= 8) and reporting improved 3D benchmarks beyond. On digital sequences, evolving Sobol' parameters yields consistent reductions in randomized quasi-Monte Carlo (rQMC) mean-squared error for several 32-dimensional option-pricing tasks relative to widely used Joe--Kuo parameters, while preserving extensibility to any sample size and compatibility with standard randomizations. Taken together, the results demonstrate that LLM-driven evolutionary program synthesis can automate the discovery of high-quality QMC constructions, recovering classical designs where they are optimal and improving them where finite-N structure matters. Data and code are available at https://github.com/hockeyguy123/openevolve-star-discrepancy.git.

Paper Structure

This paper contains 17 sections, 5 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Visualization of $N=16$ point set generation in two dimensions. (A) Initial shifted Fibonacci lattice (Discrepancy: 0.0962). (B) Best direct construction found in Phase 1 (Discrepancy: 0.0924). (C) Final optimized point set from Phase 2 (Discrepancy: 0.0744), which is within 0.68% of the known optimal value of 0.0739.
  • Figure 2: The Star Discrepancy $D_N^*$ of Sobol', Halton, Hammersley, Fibonacci, Rank-1-Lattice, MPMC (message passing Monte Carlo), and LLM-evolved sets for increasing number of points $N = 100 \dots 1020$ in 2D.
  • Figure 3: The % reduction in MSE (rQMC integration over 10000 random scrambles and shifts) using Sobol' direction numbers found via LLM evolutionary search vs. those of Joe2008. The % reduction in MSE are averaged across all scenarios of that particular option (Appendix C).
  • Figure 4: Directly Constructed 16 Point Set ($N=16$)
  • Figure 5: Iteratively Optimized 2D Point Set ($N=16$)