Synthesizing Interpretable Control Policies through Large Language Model Guided Search

Carlo Bosio; Mark W. Mueller

Synthesizing Interpretable Control Policies through Large Language Model Guided Search

Carlo Bosio, Mark W. Mueller

TL;DR

The paper addresses the need for interpretable control in dynamical systems by encoding policies as Python programs and guiding their synthesis with a pre-trained LLM in a simulation-based evaluation loop. The policy search maximizes $R=\sum_t r_t$ under $x_{t+1}=f(x_t,u_t)$ with $u_t=\texttt{policy}(x_t)$, while the LLM generates candidate policies and a programs database stores high performers for iterative improvement. A two-stage prompt strategy and island-based parallel search enable combination of ideas from prior policies, producing compact, readable controllers demonstrated on pendulum swing-up and ball-in-cup tasks. The work shows that code-based policy representations can achieve interpretable, verifiable control while still leveraging powerful generative models, albeit with substantial computation and careful prompt design.

Abstract

The combination of Large Language Models (LLMs), systematic evaluation, and evolutionary algorithms has enabled breakthroughs in combinatorial optimization and scientific discovery. We propose to extend this powerful combination to the control of dynamical systems, generating interpretable control policies capable of complex behaviors. With our novel method, we represent control policies as programs in standard languages like Python. We evaluate candidate controllers in simulation and evolve them using a pre-trained LLM. Unlike conventional learning-based control techniques, which rely on black-box neural networks to encode control policies, our approach enhances transparency and interpretability. We still take advantage of the power of large AI models, but only at the policy design phase, ensuring that all system components remain interpretable and easily verifiable at runtime. Additionally, the use of standard programming languages makes it straightforward for humans to finetune or adapt the controllers based on their expertise and intuition. We illustrate our method through its application to the synthesis of an interpretable control policy for the pendulum swing-up and the ball in cup tasks. We make the code available at https://github.com/muellerlab/synthesizing_interpretable_control_policies.git.

Synthesizing Interpretable Control Policies through Large Language Model Guided Search

TL;DR

under

with

, while the LLM generates candidate policies and a programs database stores high performers for iterative improvement. A two-stage prompt strategy and island-based parallel search enable combination of ideas from prior policies, producing compact, readable controllers demonstrated on pendulum swing-up and ball-in-cup tasks. The work shows that code-based policy representations can achieve interpretable, verifiable control while still leveraging powerful generative models, albeit with substantial computation and careful prompt design.

Abstract

Paper Structure (16 sections, 13 equations, 7 figures)

This paper contains 16 sections, 13 equations, 7 figures.

INTRODUCTION
Related Work
Large Language Models
Learning-Based Control
Interpretability in Learning-Based Control
METHODOLOGY
Specification
Prompt Construction
Program Generation
Program Evaluation
Programs Database
SET UP AND CASE STUDIES
Setup
Pendulum swing-up
Ball in Cup
...and 1 more sections

Figures (7)

Figure 1: Schematic of the algorithmic infrastructure for the synthesis of interpretable control policies. The input to the algorithm is a specification file a) containing a task description, the implementation of an evaluation function to score programs, and some starter code for the control policy to evolve. A prompt b) is constructed pasting the current best programs (the starter code at the beginning). The prompt is fed to a Program Generation block c) containing a pre-trained LLM, which produces more programs. The control policies contained in the LLM outputs are fed to the Program Evaluation block d), which scores them based on their performance in simulation. The programs leading to poor performance are discarded, while the higher scoring ones are stored in a Database e), from which they are sampled to be included in following prompts and improved.
Figure 2: Example template for a control synthesis specification.
Figure 3: Example template for a prompt. The LLM generates a body for the provided function signature.
Figure 4: Best performing control program generated for pendulum swing-up with our technique. The proposed policy applies positive work when the pendulum is not within a certain angular threshold from the upright position. Otherwise, it switches to a linear controller. The control action is normalized within $[-1,1]$, therefore, in the initial phases, the control takes one of the two limit values, depending on the sign of the angular velocity.
Figure 5: a) Schematic of the pendulum system and the angle convention. b) Screenshot of a visualization from the simulation environment. c) Example plots of the closed loop evolution for the swing-up task. In the top graph, it is possible to observe a bang-bang style control in the first phase, followed by a linear feedback in the second phase.
...and 2 more figures

Synthesizing Interpretable Control Policies through Large Language Model Guided Search

TL;DR

Abstract

Synthesizing Interpretable Control Policies through Large Language Model Guided Search

Authors

TL;DR

Abstract

Table of Contents

Figures (7)