Controllable Text Generation with Language Constraints
Howard Chen, Huihan Li, Danqi Chen, Karthik Narasimhan
TL;DR
The paper tackles controllable text generation when constraints and topics are given in natural language, introducing Cognac and CognacGen to steer outputs using the model's internal knowledge without retraining. A knowledge-intensive Cognac benchmark is built from WordNet and Wikidata, paired with diverse NL instructions, to evaluate instruction conformance and fluency. CognacGen combines a generation model with guidance models (binary verifier, top-k, textual examples) and a trie-based decoding scheme, plus a self-guided distillation step via prefix-tuning to handle unseen instructions. Empirical results show CognacGen outperforms baselines on constrained generation tasks and remains competitive against large models like GPT-3, while requiring only prefix-tuning to adapt to new instructions. The work highlights both substantial progress in knowledge-guided generation and remaining gaps related to knowledge coverage and potential biases in KBs.
Abstract
We consider the task of text generation in language models with constraints specified in natural language. To this end, we first create a challenging benchmark Cognac that provides as input to the model a topic with example text, along with a constraint on text to be avoided. Unlike prior work, our benchmark contains knowledge-intensive constraints sourced from databases like Wordnet and Wikidata, which allows for straightforward evaluation while striking a balance between broad attribute-level and narrow lexical-level controls. We find that even state-of-the-art language models like GPT-3 fail often on this task, and propose a solution to leverage a language model's own internal knowledge to guide generation. Our method, called CognacGen, first queries the language model to generate guidance terms for a specified topic or constraint, and uses the guidance to modify the model's token generation probabilities. We propose three forms of guidance (binary verifier, top-k tokens, textual example), and employ prefix-tuning approaches to distill the guidance to tackle diverse natural language constraints. Through extensive empirical evaluations, we demonstrate that CognacGen can successfully generalize to unseen instructions and outperform competitive baselines in generating constraint conforming text.
