Table of Contents
Fetching ...

Combining Large Language Models and Gradient-Free Optimization for Automatic Control Policy Synthesis

Carlo Bosio, Matteo Guarrera, Alberto Sangiovanni-Vincentelli, Mark W. Mueller

TL;DR

This work addresses the challenge of designing interpretable, high-performing control policies by decoupling symbolic structure synthesis from numerical parameter tuning. It introduces a hybrid symbolic-numeric loop in which a large language model proposes policy structures while a gradient-free, zeroth-order optimizer refines numerical parameters in-the-loop, yielding improved performance and sample efficiency over purely LLM-driven search. Across a set of control tasks—including pendulum swing-up and various locomotion benchmarks—the approach produces compact, readable policies (often under 35 lines with fewer than 10 parameters) that outperform baselines. The method bridges language-model-guided design and classical control tuning, enabling faster, interpretable controller development with practical deployment potential; code is open-sourced.

Abstract

Large Language models (LLMs) have shown promise as generators of symbolic control policies, producing interpretable program-like representations through iterative search. However, these models are not capable of separating the functional structure of a policy from the numerical values it is parametrized by, thus making the search process slow and inefficient. We propose a hybrid approach that decouples structural synthesis from parameter optimization by introducing an additional optimization layer for local parameter search. In our method, the numerical parameters of LLM-generated programs are extracted and optimized numerically to maximize task performance. With this integration, an LLM iterates over the functional structure of programs, while a separate optimization loop is used to find a locally optimal set of parameters accompanying candidate programs. We evaluate our method on a set of control tasks, showing that it achieves higher returns and improved sample efficiency compared to purely LLM-guided search. We show that combining symbolic program synthesis with numerical optimization yields interpretable yet high-performing policies, bridging the gap between language-model-guided design and classical control tuning. Our code is available at https://sites.google.com/berkeley.edu/colmo.

Combining Large Language Models and Gradient-Free Optimization for Automatic Control Policy Synthesis

TL;DR

This work addresses the challenge of designing interpretable, high-performing control policies by decoupling symbolic structure synthesis from numerical parameter tuning. It introduces a hybrid symbolic-numeric loop in which a large language model proposes policy structures while a gradient-free, zeroth-order optimizer refines numerical parameters in-the-loop, yielding improved performance and sample efficiency over purely LLM-driven search. Across a set of control tasks—including pendulum swing-up and various locomotion benchmarks—the approach produces compact, readable policies (often under 35 lines with fewer than 10 parameters) that outperform baselines. The method bridges language-model-guided design and classical control tuning, enabling faster, interpretable controller development with practical deployment potential; code is open-sourced.

Abstract

Large Language models (LLMs) have shown promise as generators of symbolic control policies, producing interpretable program-like representations through iterative search. However, these models are not capable of separating the functional structure of a policy from the numerical values it is parametrized by, thus making the search process slow and inefficient. We propose a hybrid approach that decouples structural synthesis from parameter optimization by introducing an additional optimization layer for local parameter search. In our method, the numerical parameters of LLM-generated programs are extracted and optimized numerically to maximize task performance. With this integration, an LLM iterates over the functional structure of programs, while a separate optimization loop is used to find a locally optimal set of parameters accompanying candidate programs. We evaluate our method on a set of control tasks, showing that it achieves higher returns and improved sample efficiency compared to purely LLM-guided search. We show that combining symbolic program synthesis with numerical optimization yields interpretable yet high-performing policies, bridging the gap between language-model-guided design and classical control tuning. Our code is available at https://sites.google.com/berkeley.edu/colmo.

Paper Structure

This paper contains 19 sections, 10 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Block diagram of the algorithmic infrastructure for our policy search method. The input to the algorithm is a specification file a) containing a task description, the implementation of an evaluation function to score programs, and some starter code to evolve. A prompt b) is constructed pasting the current best programs (the starter code at the beginning). The prompt is fed to a Program Generation block c) containing a pre-trained LLM, which produces more programs. The control policies generated by the LLM are fed to the Program Evaluation and Optimization block d), which optimizes them and scores them based on their closed-loop performance in simulation. The program-score pairs are stored in a Database e), from which they are sampled to be included in following prompts and improved. The output is a high-performance, programmatic control policy f).
  • Figure 2: Example pseudo-code template for a control synthesis specification. The objective_fn function is wrapped into a GFO loop (in orange).
  • Figure 3: Example pseudo-code template for a prompt. The LLM generates a body for the provided function signature trying to improve upon previously generated functions. policy_v0 and policy_v1 are sampled from the database, the best performing policies are sampled more frequently.
  • Figure 4: Parallel implementation of program generation and evaluation. The program generation (in blue) happens on the GPUs and evaluation (in orange) happens on the CPUs. In this way, the computational cost of optimization is completely shadowed.
  • Figure 5: Visualizations of the systems used as benchmarking tasks. Each task's state and action dimensions are reported in parentheses as S and A respectively.
  • ...and 2 more figures