Adaptable Logical Control for Large Language Models
Honghua Zhang, Po-Nien Kung, Masahiro Yoshida, Guy Van den Broeck, Nanyun Peng
TL;DR
Ctrl-G introduces a versatile framework that couples a frozen production-ready LLM with a distilled Hidden Markov Model to enforce logical constraints represented as deterministic finite automata at inference time. The method provides guaranteed constraint satisfaction, supports arbitrary DFA-based constraints without retraining, and scales across tasks from interactive text editing to commonsense generation and text infilling. By deriving an efficient marginalization algorithm for HMMs over DFAs and demonstrating strong empirical gains, Ctrl-G outperforms larger models like GPT-3.5 and GPT-4 on constrained generation benchmarks and achieves 100% constraint adherence on several tasks. The work also explores broader benefits, including improved reasoning in GSM and potential applications in detoxification and topic/sentiment control, indicating substantial practical impact for controllable LLM generation.
Abstract
Despite the success of Large Language Models (LLMs) on various tasks following human instructions, controlling model generation at inference time poses a persistent challenge. In this paper, we introduce Ctrl-G, an adaptable framework that facilitates tractable and flexible control of LLM generation to reliably follow logical constraints. Ctrl-G combines any production-ready LLM with a Hidden Markov Model, enabling LLM outputs to adhere to logical constraints represented as deterministic finite automata. We show that Ctrl-G, when applied to a TULU2-7B model, outperforms GPT3.5 and GPT4 on the task of interactive text editing: specifically, for the task of generating text insertions/continuations following logical constraints, Ctrl-G achieves over 30% higher satisfaction rate in human evaluation compared to GPT4. When applied to medium-size language models (e.g., GPT2-large), Ctrl-G also beats its counterparts for constrained generation by large margins on standard benchmarks. Additionally, as a proof-of-concept study, we experiment Ctrl-G on the Grade School Math benchmark to assist LLM reasoning, foreshadowing the application of Ctrl-G, as well as other constrained generation approaches, beyond traditional language generation tasks.
