Table of Contents
Fetching ...

LLM-Guided Compositional Program Synthesis

Ruhma Khan, Sumit Gulwani, Vu Le, Arjun Radhakrishna, Ashish Tiwari, Gust Verbruggen

TL;DR

This work tackles the instability of LLM-driven program synthesis by introducing SymLLM, a failure-guided compositional approach that salvages prefix or suffix fragments of an initially generated program and solves the remaining subproblems via forward and backward execution semantics. Four synthesis strategies—ForwardAll, Forward1, Backward1, and IfThenElse—drive the decomposition and recursive resolution of PBE tasks, with a disciplined, shallow recursion depth to maintain efficiency. Empirical results on Playgol-derived benchmarks show that Compositional SymLLM solves a meaningful portion of challenging Python tasks beyond self-reflection, and the method adapts to Excel-formula targets with complementary performance patterns. The work demonstrates that decomposing PBE problems into solvable subproblems, guided by LLMs and underpinned by dataflow-inspired salvor strategies, yields scalable gains and broadens the applicability of LLM-assisted program synthesis in practical settings.

Abstract

Program synthesis from input-output examples, also called programming by example (PBE), has had tremendous impact on automating end-user tasks. Large language models (LLMs) have the ability to solve PBE tasks by generating code in different target languages, but they can fail unpredictably. To recover for failure, most approaches, such as self-reflection, use the LLM to solve the same task, but with a richer context. We introduce a novel technique that recovers from failure by constructing simpler subtasks for the LLM to solve. Our approach performs compositional program synthesis using LLMs, where LLM not only guides the decomposition of the PBE task into subtasks, but also solves the subtasks. We present different strategies for decomposing the original task. We experimentally show that our approach can solve challenging task instances that are not solved by self-reflection alone.

LLM-Guided Compositional Program Synthesis

TL;DR

This work tackles the instability of LLM-driven program synthesis by introducing SymLLM, a failure-guided compositional approach that salvages prefix or suffix fragments of an initially generated program and solves the remaining subproblems via forward and backward execution semantics. Four synthesis strategies—ForwardAll, Forward1, Backward1, and IfThenElse—drive the decomposition and recursive resolution of PBE tasks, with a disciplined, shallow recursion depth to maintain efficiency. Empirical results on Playgol-derived benchmarks show that Compositional SymLLM solves a meaningful portion of challenging Python tasks beyond self-reflection, and the method adapts to Excel-formula targets with complementary performance patterns. The work demonstrates that decomposing PBE problems into solvable subproblems, guided by LLMs and underpinned by dataflow-inspired salvor strategies, yields scalable gains and broadens the applicability of LLM-assisted program synthesis in practical settings.

Abstract

Program synthesis from input-output examples, also called programming by example (PBE), has had tremendous impact on automating end-user tasks. Large language models (LLMs) have the ability to solve PBE tasks by generating code in different target languages, but they can fail unpredictably. To recover for failure, most approaches, such as self-reflection, use the LLM to solve the same task, but with a richer context. We introduce a novel technique that recovers from failure by constructing simpler subtasks for the LLM to solve. Our approach performs compositional program synthesis using LLMs, where LLM not only guides the decomposition of the PBE task into subtasks, but also solves the subtasks. We present different strategies for decomposing the original task. We experimentally show that our approach can solve challenging task instances that are not solved by self-reflection alone.

Paper Structure

This paper contains 23 sections, 2 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Illustration of our approach: We start with input-output (IO) examples. In Step (1), we use an LLM to generate a program ${\textsc{F1}}$. Since ${\textsc{F1}}$ does not generate the desired outputs, (2a) we decompose it and salvage ${\textsc{Fwd1}}$, (2b) we use LLM to solve the remaining task of transforming output of ${\textsc{Fwd1}}$ to $O$, and if successful, (2c) return the composed program. If unsuccessful, (3a) we salvage the full ${\textsc{F1}}$, (3b) use LLM to transform output of ${\textsc{F1}}$ to $O$, and if successful, (3c) return the composed program. If unsuccessful, (4a) we salvage the last step, (4b) use LLM to synthesize the two pieces needed to produce the 2 variables used in the last step, and (4c) return the composed program. There is a fourth if-then-else composition based strategy too in our approach.
  • Figure 2: Decomposing a program: If we view the computation of the output from the inputs as a tree whose root is the output (left) and whose leaves are constants and inputs (right), then the backward1 program is the subtree shown within dotted oval on the left and the forward1 program is the subtree shown within the dotted oval on the right.
  • Figure 3: Prompt for back propagating values through backward1 functions. The prompt consists of a system prompt, followed by two examples showing what the user might say and how the assistant is supposed to reply. The last message from the user is instantiated to the actual values.
  • Figure 4: Prompt for generating conditions for use in if-then-else programs to enable parallel composition of two programs to yield a correct final program. Any generated candidate programs can be executed to check for their correctness.
  • Figure 5: The number of benchmarks solved in Python by each technique and their union -- out of the 665 SymLLM-playgol-py hard benchmarks. The Venn diagram showing the same data, which also highlights how many benchmarks are solved exclusively by one or two of the three approaches. Each technique is able to solve some benchmarks that no other technique solved.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Example 1