OptiHive: Ensemble Selection for LLM-Based Optimization via Statistical Modeling
Maxime Bouscary, Saurabh Amin
TL;DR
OptiHive introduces a two-stage, batched LLM-based optimization framework that first generates and filters interpretable solver/instance/test components via an ILP, then uses an EM-based latent-class model to quantify true performance and calibrate solver selection under uncertainty. By treating solvers, instances, and tests as noisy contributors and leveraging a probabilistic ensemble, it achieves large gains in feasibility and optimality on complex MDVRP variants and WSCP benchmarks with minimal added latency. The approach eliminates iterative self-correction loops and provides principled uncertainty-aware solver ranking, enabling robust, low-latency deployment around existing solver-generation pipelines. The work demonstrates that high-quality solvers can be recovered from imperfect components, particularly when instance and test signals are diverse and informative, highlighting practical impact for scalable, NL-to-optimization workflows.
Abstract
LLM-based solvers have emerged as a promising means of automating problem modeling and solving. However, they remain unreliable and often depend on iterative repair loops that result in significant latency. We introduce OptiHive, a framework that enhances any solver-generation pipeline to produce higher-quality solvers from natural-language descriptions of optimization problems. OptiHive uses a single batched generation to produce diverse components (solvers, problem instances, and validation tests) and filters out erroneous components to ensure fully interpretable outputs. Accounting for the imperfection of the generated components, we employ a statistical model to infer their true performance, enabling principled uncertainty quantification and solver selection. On tasks ranging from traditional optimization problems to challenging variants of the Multi-Depot Vehicle Routing Problem, OptiHive significantly outperforms baselines, increasing the optimality rate from 5% to 92% on the most complex problems.
