Combining Large Language Models and Gradient-Free Optimization for Automatic Control Policy Synthesis
Carlo Bosio, Matteo Guarrera, Alberto Sangiovanni-Vincentelli, Mark W. Mueller
TL;DR
This work addresses the challenge of designing interpretable, high-performing control policies by decoupling symbolic structure synthesis from numerical parameter tuning. It introduces a hybrid symbolic-numeric loop in which a large language model proposes policy structures while a gradient-free, zeroth-order optimizer refines numerical parameters in-the-loop, yielding improved performance and sample efficiency over purely LLM-driven search. Across a set of control tasks—including pendulum swing-up and various locomotion benchmarks—the approach produces compact, readable policies (often under 35 lines with fewer than 10 parameters) that outperform baselines. The method bridges language-model-guided design and classical control tuning, enabling faster, interpretable controller development with practical deployment potential; code is open-sourced.
Abstract
Large Language models (LLMs) have shown promise as generators of symbolic control policies, producing interpretable program-like representations through iterative search. However, these models are not capable of separating the functional structure of a policy from the numerical values it is parametrized by, thus making the search process slow and inefficient. We propose a hybrid approach that decouples structural synthesis from parameter optimization by introducing an additional optimization layer for local parameter search. In our method, the numerical parameters of LLM-generated programs are extracted and optimized numerically to maximize task performance. With this integration, an LLM iterates over the functional structure of programs, while a separate optimization loop is used to find a locally optimal set of parameters accompanying candidate programs. We evaluate our method on a set of control tasks, showing that it achieves higher returns and improved sample efficiency compared to purely LLM-guided search. We show that combining symbolic program synthesis with numerical optimization yields interpretable yet high-performing policies, bridging the gap between language-model-guided design and classical control tuning. Our code is available at https://sites.google.com/berkeley.edu/colmo.
