Table of Contents
Fetching ...

Adaptable Logical Control for Large Language Models

Honghua Zhang, Po-Nien Kung, Masahiro Yoshida, Guy Van den Broeck, Nanyun Peng

TL;DR

Ctrl-G introduces a versatile framework that couples a frozen production-ready LLM with a distilled Hidden Markov Model to enforce logical constraints represented as deterministic finite automata at inference time. The method provides guaranteed constraint satisfaction, supports arbitrary DFA-based constraints without retraining, and scales across tasks from interactive text editing to commonsense generation and text infilling. By deriving an efficient marginalization algorithm for HMMs over DFAs and demonstrating strong empirical gains, Ctrl-G outperforms larger models like GPT-3.5 and GPT-4 on constrained generation benchmarks and achieves 100% constraint adherence on several tasks. The work also explores broader benefits, including improved reasoning in GSM and potential applications in detoxification and topic/sentiment control, indicating substantial practical impact for controllable LLM generation.

Abstract

Despite the success of Large Language Models (LLMs) on various tasks following human instructions, controlling model generation at inference time poses a persistent challenge. In this paper, we introduce Ctrl-G, an adaptable framework that facilitates tractable and flexible control of LLM generation to reliably follow logical constraints. Ctrl-G combines any production-ready LLM with a Hidden Markov Model, enabling LLM outputs to adhere to logical constraints represented as deterministic finite automata. We show that Ctrl-G, when applied to a TULU2-7B model, outperforms GPT3.5 and GPT4 on the task of interactive text editing: specifically, for the task of generating text insertions/continuations following logical constraints, Ctrl-G achieves over 30% higher satisfaction rate in human evaluation compared to GPT4. When applied to medium-size language models (e.g., GPT2-large), Ctrl-G also beats its counterparts for constrained generation by large margins on standard benchmarks. Additionally, as a proof-of-concept study, we experiment Ctrl-G on the Grade School Math benchmark to assist LLM reasoning, foreshadowing the application of Ctrl-G, as well as other constrained generation approaches, beyond traditional language generation tasks.

Adaptable Logical Control for Large Language Models

TL;DR

Ctrl-G introduces a versatile framework that couples a frozen production-ready LLM with a distilled Hidden Markov Model to enforce logical constraints represented as deterministic finite automata at inference time. The method provides guaranteed constraint satisfaction, supports arbitrary DFA-based constraints without retraining, and scales across tasks from interactive text editing to commonsense generation and text infilling. By deriving an efficient marginalization algorithm for HMMs over DFAs and demonstrating strong empirical gains, Ctrl-G outperforms larger models like GPT-3.5 and GPT-4 on constrained generation benchmarks and achieves 100% constraint adherence on several tasks. The work also explores broader benefits, including improved reasoning in GSM and potential applications in detoxification and topic/sentiment control, indicating substantial practical impact for controllable LLM generation.

Abstract

Despite the success of Large Language Models (LLMs) on various tasks following human instructions, controlling model generation at inference time poses a persistent challenge. In this paper, we introduce Ctrl-G, an adaptable framework that facilitates tractable and flexible control of LLM generation to reliably follow logical constraints. Ctrl-G combines any production-ready LLM with a Hidden Markov Model, enabling LLM outputs to adhere to logical constraints represented as deterministic finite automata. We show that Ctrl-G, when applied to a TULU2-7B model, outperforms GPT3.5 and GPT4 on the task of interactive text editing: specifically, for the task of generating text insertions/continuations following logical constraints, Ctrl-G achieves over 30% higher satisfaction rate in human evaluation compared to GPT4. When applied to medium-size language models (e.g., GPT2-large), Ctrl-G also beats its counterparts for constrained generation by large margins on standard benchmarks. Additionally, as a proof-of-concept study, we experiment Ctrl-G on the Grade School Math benchmark to assist LLM reasoning, foreshadowing the application of Ctrl-G, as well as other constrained generation approaches, beyond traditional language generation tasks.
Paper Structure (29 sections, 2 theorems, 9 equations, 9 figures, 5 tables, 1 algorithm)

This paper contains 29 sections, 2 theorems, 9 equations, 9 figures, 5 tables, 1 algorithm.

Key Result

Theorem 3.2

Given a constraint $\alpha$ represented as a DFA with $m$ edges and an HMM with $h$ hidden states, the time complexity for sampling a sequence of $n$ tokens from $p_{\text{ctrl-g}}(x_{1:n} \:\vert\:\alpha)$ is $O(nmh^2)$.

Figures (9)

  • Figure 1: Ctrl-G pipeline; both the LLM and the HMM are frozen once trained.
  • Figure 2: An example usage of Ctrl-G for text insertion with multiple constraints.
  • Figure 3: Example of a DFA representing the logical constraint that the phrase "gets cold" must appear in the generated text along with pseudo-code for representing this DFA in Ctrl-G.
  • Figure 4: An example showing the intersection (logical and) and concatenation of two DFAs.
  • Figure 5: Results on CommonGen+ across different # of concepts.
  • ...and 4 more figures

Theorems & Definitions (3)

  • Definition 3.1
  • Theorem 3.2
  • Proposition 4.1