Online Prompt Selection for Program Synthesis
Yixuan Li, Lewis Frampton, Federico Mora, Elizabeth Polgreen
TL;DR
CYANEA tackles the core problem of when and how to use LLMs versus symbolic solvers for program synthesis by recasting it as an online, contextual multi-armed bandit problem over a portfolio of solvers and prompting styles. The approach combines a featurized representation of synthesis queries, single- and multi-layer bandit architectures, and time/token budgeting to sequentially deploy solvers and optimize a tunable reward (speed, cost, or success). It integrates a library of prompting styles (natural language, few-shot, Lisp-based prompts, roles, and emotional cues) with an enumerative solver (CEGIS with A* search) and uses k-NN predictors to select among solvers, guided by multiple reward signals. Empirical evaluation on 1269 queries from sygus-comp and related sources shows CYANEA solving 37.2% more queries than the best single solver and operating within 4% of the virtual best, indicating strong practical gains in solving coverage and cost-efficiency for program synthesis tasks.
Abstract
Large Language Models (LLMs) demonstrate impressive capabilities in the domain of program synthesis. This level of performance is not, however, universal across all tasks, all LLMs and all prompting styles. There are many areas where one LLM dominates, one prompting style dominates, or where calling a symbolic solver is a better choice than an LLM. A key challenge for the user then, is to identify not only when an LLM is the right choice of solver, and the appropriate LLM to call for a given synthesis task, but also the right way to call it. A non-expert user who makes the wrong choice, incurs a cost both in terms of results (number of tasks solved, and the time it takes to solve them) and financial cost, if using a closed-source language model via a commercial API. We frame this choice as an online learning problem. We use a multi-armed bandit algorithm to select which symbolic solver, or LLM and prompt combination to deploy in order to maximize a given reward function (which may prioritize solving time, number of synthesis tasks solved, or financial cost of solving). We implement an instance of this approach, called CYANEA, and evaluate it on synthesis queries from the literature in ranking function synthesis, from the syntax-guided synthesis competition, and fresh, unseen queries generated from SMT problems. CYANEA solves 37.2% more queries than the best single solver and achieves results within 4% of the virtual best solver.
