Grammar Prompting for Domain-Specific Language Generation with Large Language Models
Bailin Wang, Zi Wang, Xuezhi Wang, Yuan Cao, Rif A. Saurous, Yoon Kim
TL;DR
Grammar prompting introduces a principled way to embed domain-specific knowledge into LLM-based DSL generation by conditioning outputs on minimal specialized grammars inferred from demonstrations. By predicting a grammar \widehat{G} before generating the program \widehat{y}, the approach enforces syntactic constraints and can be paired with constrained decoding to guarantee validity. Across semantic parsing, AI planning, and molecule generation, grammar prompting yields measurable improvements over standard prompting, especially in true few-shot settings and for compositional/generalization tasks. The results suggest that leveraging metalanguages like BNF within prompts can expand the capabilities of LLMs for domain-specific, structured tasks and augment classical methods in planning and synthesis.
Abstract
Large language models (LLMs) can learn to perform a wide range of natural language tasks from just a handful of in-context examples. However, for generating strings from highly structured languages (e.g., semantic parsing to complex domain-specific languages), it is challenging for the LLM to generalize from just a few exemplars. We propose \emph{grammar prompting}, a simple approach to enable LLMs to use external knowledge and domain-specific constraints, expressed through a grammar in Backus--Naur Form (BNF), during in-context learning. Grammar prompting augments each demonstration example with a specialized grammar that is minimally sufficient for generating the particular output example, where the specialized grammar is a subset of the full DSL grammar. For inference, the LLM first predicts a BNF grammar given a test input, and then generates the output according to the rules of the grammar. Experiments demonstrate that grammar prompting can enable LLMs to perform competitively on a diverse set of DSL generation tasks, including semantic parsing (SMCalFlow, Overnight, GeoQuery), PDDL planning, and SMILES-based molecule generation.
