Table of Contents
Fetching ...

Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning

Saibo Geng, Martin Josifoski, Maxime Peyrard, Robert West

TL;DR

This work introduces grammar-constrained decoding (GCD), a framework that enforces structured output spaces for language models by describing tasks with formal grammars and constraining decoding via an incremental parser. It extends to input-dependent grammars (IDG), enabling task outputs to adapt to the input, and demonstrates strong few-shot performance on closed information extraction, entity disambiguation, and constituency parsing without finetuning. Across cIE, ED, and CP, GCD with large LLMs often rivals or surpasses some finetuned baselines, especially with IDG, though CP remains more challenging and still lags behind fully supervised parsers. The study provides practical guidance on when GCD helps, its latency characteristics, and how to mitigate issues like likelihood misalignment, positioning GCD as a rapid, cost-effective adaptation strategy for structured NLP tasks. Code and data are released to facilitate adoption and further research.

Abstract

Despite their impressive performance, large language models (LMs) still struggle with reliably generating complex output structures when not finetuned to follow the required output format exactly. To address this issue, grammar-constrained decoding (GCD) can be used to control the generation of LMs, guaranteeing that the output follows a given structure. Most existing GCD methods are, however, limited to specific tasks, such as parsing or code generation. In this work, we demonstrate that formal grammars can describe the output space for a much wider range of tasks and argue that GCD can serve as a unified framework for structured NLP tasks in general. For increased flexibility, we introduce input-dependent grammars, which allow the grammar to depend on the input and thus enable the generation of different output structures for different inputs. We then empirically demonstrate the power and flexibility of GCD-enhanced LMs on (1) information extraction, (2) entity disambiguation, and (3) constituency parsing. Our results indicate that grammar-constrained LMs substantially outperform unconstrained LMs or even beat task-specific finetuned models. Grammar constraints thus hold great promise for harnessing off-the-shelf LMs for a wide range of structured NLP tasks, especially where training data is scarce or finetuning is expensive. Code and data: https://github.com/epfl-dlab/GCD.

Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning

TL;DR

This work introduces grammar-constrained decoding (GCD), a framework that enforces structured output spaces for language models by describing tasks with formal grammars and constraining decoding via an incremental parser. It extends to input-dependent grammars (IDG), enabling task outputs to adapt to the input, and demonstrates strong few-shot performance on closed information extraction, entity disambiguation, and constituency parsing without finetuning. Across cIE, ED, and CP, GCD with large LLMs often rivals or surpasses some finetuned baselines, especially with IDG, though CP remains more challenging and still lags behind fully supervised parsers. The study provides practical guidance on when GCD helps, its latency characteristics, and how to mitigate issues like likelihood misalignment, positioning GCD as a rapid, cost-effective adaptation strategy for structured NLP tasks. Code and data are released to facilitate adoption and further research.

Abstract

Despite their impressive performance, large language models (LMs) still struggle with reliably generating complex output structures when not finetuned to follow the required output format exactly. To address this issue, grammar-constrained decoding (GCD) can be used to control the generation of LMs, guaranteeing that the output follows a given structure. Most existing GCD methods are, however, limited to specific tasks, such as parsing or code generation. In this work, we demonstrate that formal grammars can describe the output space for a much wider range of tasks and argue that GCD can serve as a unified framework for structured NLP tasks in general. For increased flexibility, we introduce input-dependent grammars, which allow the grammar to depend on the input and thus enable the generation of different output structures for different inputs. We then empirically demonstrate the power and flexibility of GCD-enhanced LMs on (1) information extraction, (2) entity disambiguation, and (3) constituency parsing. Our results indicate that grammar-constrained LMs substantially outperform unconstrained LMs or even beat task-specific finetuned models. Grammar constraints thus hold great promise for harnessing off-the-shelf LMs for a wide range of structured NLP tasks, especially where training data is scarce or finetuning is expensive. Code and data: https://github.com/epfl-dlab/GCD.
Paper Structure (35 sections, 10 figures, 9 tables)

This paper contains 35 sections, 10 figures, 9 tables.

Figures (10)

  • Figure 1: Grammar-constrained decoding (GCD), applied to the task of closed information extraction, where the goal is to extract a list $y$ of subject--relation--object triplets from the input text $x$. Subjects and objects are constrained to be Wikidata entities, relations to be a Wikidata relation. During decoding, only valid token continuations compliant with the grammar are considered. For simplicity, we omit the special marker symbols [s], [r], and [o] in the schema of the generation process.
  • Figure 2: Formal grammars for 14 structured NLP tasks, highlighting the general applicability of grammar-constrained decoding. All 14 grammars are context-free (mostly regular). * marks input-dependent grammars. Inputs $x=\langle x_0, \dots, x_{n-1} \rangle$ are sequences of lexical units (e.g., words); $0 \leq i \leq n-1$; single capital letters are non-terminal symbols; $S$ or $S_0$ is the start symbol; $\varepsilon$ is the empty string; [ and ] are special terminal symbols.
  • Figure 3: latency of WikiNER grammar
  • Figure 4: latency of REBEL grammar
  • Figure 5: Example of 1 shot CP on PTB instance No.12 The golden parse tree is "( S ( ADVP-TMP ( RB Now ) ) ( NP-SBJ ( PRP we ) ) ( VP ( VBP 're ) ( PP-LOC-PRD ( IN at ) ( NP ( NP ( DT the ) ( NN bottom ) ) ( PP ( IN of ) ( NP ( DT the ) ( NN heap ) ) ) ) ) ) )" The generation from Vicuna-13B is not correct, but it still looks like a reasonable parse tree. The generation from LLaMA-13B fails to follow the instruction.
  • ...and 5 more figures