Table of Contents
Fetching ...

Grammar Prompting for Domain-Specific Language Generation with Large Language Models

Bailin Wang, Zi Wang, Xuezhi Wang, Yuan Cao, Rif A. Saurous, Yoon Kim

TL;DR

Grammar prompting introduces a principled way to embed domain-specific knowledge into LLM-based DSL generation by conditioning outputs on minimal specialized grammars inferred from demonstrations. By predicting a grammar \widehat{G} before generating the program \widehat{y}, the approach enforces syntactic constraints and can be paired with constrained decoding to guarantee validity. Across semantic parsing, AI planning, and molecule generation, grammar prompting yields measurable improvements over standard prompting, especially in true few-shot settings and for compositional/generalization tasks. The results suggest that leveraging metalanguages like BNF within prompts can expand the capabilities of LLMs for domain-specific, structured tasks and augment classical methods in planning and synthesis.

Abstract

Large language models (LLMs) can learn to perform a wide range of natural language tasks from just a handful of in-context examples. However, for generating strings from highly structured languages (e.g., semantic parsing to complex domain-specific languages), it is challenging for the LLM to generalize from just a few exemplars. We propose \emph{grammar prompting}, a simple approach to enable LLMs to use external knowledge and domain-specific constraints, expressed through a grammar in Backus--Naur Form (BNF), during in-context learning. Grammar prompting augments each demonstration example with a specialized grammar that is minimally sufficient for generating the particular output example, where the specialized grammar is a subset of the full DSL grammar. For inference, the LLM first predicts a BNF grammar given a test input, and then generates the output according to the rules of the grammar. Experiments demonstrate that grammar prompting can enable LLMs to perform competitively on a diverse set of DSL generation tasks, including semantic parsing (SMCalFlow, Overnight, GeoQuery), PDDL planning, and SMILES-based molecule generation.

Grammar Prompting for Domain-Specific Language Generation with Large Language Models

TL;DR

Grammar prompting introduces a principled way to embed domain-specific knowledge into LLM-based DSL generation by conditioning outputs on minimal specialized grammars inferred from demonstrations. By predicting a grammar \widehat{G} before generating the program \widehat{y}, the approach enforces syntactic constraints and can be paired with constrained decoding to guarantee validity. Across semantic parsing, AI planning, and molecule generation, grammar prompting yields measurable improvements over standard prompting, especially in true few-shot settings and for compositional/generalization tasks. The results suggest that leveraging metalanguages like BNF within prompts can expand the capabilities of LLMs for domain-specific, structured tasks and augment classical methods in planning and synthesis.

Abstract

Large language models (LLMs) can learn to perform a wide range of natural language tasks from just a handful of in-context examples. However, for generating strings from highly structured languages (e.g., semantic parsing to complex domain-specific languages), it is challenging for the LLM to generalize from just a few exemplars. We propose \emph{grammar prompting}, a simple approach to enable LLMs to use external knowledge and domain-specific constraints, expressed through a grammar in Backus--Naur Form (BNF), during in-context learning. Grammar prompting augments each demonstration example with a specialized grammar that is minimally sufficient for generating the particular output example, where the specialized grammar is a subset of the full DSL grammar. For inference, the LLM first predicts a BNF grammar given a test input, and then generates the output according to the rules of the grammar. Experiments demonstrate that grammar prompting can enable LLMs to perform competitively on a diverse set of DSL generation tasks, including semantic parsing (SMCalFlow, Overnight, GeoQuery), PDDL planning, and SMILES-based molecule generation.
Paper Structure (36 sections, 10 equations, 12 figures, 8 tables, 1 algorithm)

This paper contains 36 sections, 10 equations, 12 figures, 8 tables, 1 algorithm.

Figures (12)

  • Figure 1: A simple BNF grammar for a calendar DSL.
  • Figure 2: Example of grammar prompting for a calendar DSL. We interleave the minimal specialized grammar $G[{\bm{y}}^{(i)}]$ between the demonstrations ${\bm{x}}^{(i)}$ and ${\bm{y}}^{(i)}$. During decoding, the LLM first predicts the specialized grammar $\widehat{G}$, and then predicts the program $\widehat{{\bm{y}}}$ conditioned on $\widehat{G}$. The blue portion is not part of the actual prompt and only shown for illustrative purposes.
  • Figure 3: Illustration of how an predicted program is corrected in our proposed Earley-based constrained decoding. The final partial program will be subsequently fed into the LLM for continuation.
  • Figure 4: Example of a specialized grammar for generating a molecule from the Acrylates class.
  • Figure 5: Example of a specialized grammar for PDDL planning in the Blocks domain. Given an input ${\bm{x}} = ({\bm{s}}_0, {\bm{s}}_g)$, the specialized grammar $G[{\bm{y}}]$ only includes necessary actions for solving this task.
  • ...and 7 more figures