Table of Contents
Fetching ...

HYSYNTH: Context-Free LLM Approximation for Guiding Program Synthesis

Shraddha Barke, Emmanuel Anaya Gonzalez, Saketh Ram Kasibatla, Taylor Berg-Kirkpatrick, Nadia Polikarpova

TL;DR

This work introduces a hybrid approach, where LLM completions for a given task are used to learn a task-specific, context-free surrogate model, which is then used to guide program synthesis and outperforms both unguided search and direct sampling from LLMs, as well as existing program synthesizers.

Abstract

Many structured prediction and reasoning tasks can be framed as program synthesis problems, where the goal is to generate a program in a domain-specific language (DSL) that transforms input data into the desired output. Unfortunately, purely neural approaches, such as large language models (LLMs), often fail to produce fully correct programs in unfamiliar DSLs, while purely symbolic methods based on combinatorial search scale poorly to complex problems. Motivated by these limitations, we introduce a hybrid approach, where LLM completions for a given task are used to learn a task-specific, context-free surrogate model, which is then used to guide program synthesis. We evaluate this hybrid approach on three domains, and show that it outperforms both unguided search and direct sampling from LLMs, as well as existing program synthesizers.

HYSYNTH: Context-Free LLM Approximation for Guiding Program Synthesis

TL;DR

This work introduces a hybrid approach, where LLM completions for a given task are used to learn a task-specific, context-free surrogate model, which is then used to guide program synthesis and outperforms both unguided search and direct sampling from LLMs, as well as existing program synthesizers.

Abstract

Many structured prediction and reasoning tasks can be framed as program synthesis problems, where the goal is to generate a program in a domain-specific language (DSL) that transforms input data into the desired output. Unfortunately, purely neural approaches, such as large language models (LLMs), often fail to produce fully correct programs in unfamiliar DSLs, while purely symbolic methods based on combinatorial search scale poorly to complex problems. Motivated by these limitations, we introduce a hybrid approach, where LLM completions for a given task are used to learn a task-specific, context-free surrogate model, which is then used to guide program synthesis. We evaluate this hybrid approach on three domains, and show that it outperforms both unguided search and direct sampling from LLMs, as well as existing program synthesizers.
Paper Structure (55 sections, 5 equations, 16 figures, 3 algorithms)

This paper contains 55 sections, 5 equations, 16 figures, 3 algorithms.

Figures (16)

  • Figure 1: Example problems from the three PBE domains we evaluate HySynth on: grid-based puzzles (Arc), tensor manipulation (Tensor), and string manipulation (String).
  • Figure 2: An overview of the hybrid program synthesis technique that uses a context-free LLM approximation. Programs generated by an LLM are used to learn a PCFG, which guides a bottom-up synthesizer to generate programs until a solution is found.
  • Figure 3: A fragment from the context-free grammar of our Arc DSL.
  • Figure 4: (a,b,c) Number of benchmarks solved by HySynth as a function of time for the Arc, Tensor, and String domains; timeout is 10 min. (d) Percentage of syntactically valid completions per domain.
  • Figure 5: HySynth-Arc, HySynth-Tensor and HySynth-String results guided by a PCFG learned from different number of Gpt4o samples (n=10, 20, 50, 100).
  • ...and 11 more figures