Controllable Text Generation with Language Constraints

Howard Chen; Huihan Li; Danqi Chen; Karthik Narasimhan

Controllable Text Generation with Language Constraints

Howard Chen, Huihan Li, Danqi Chen, Karthik Narasimhan

TL;DR

The paper tackles controllable text generation when constraints and topics are given in natural language, introducing Cognac and CognacGen to steer outputs using the model's internal knowledge without retraining. A knowledge-intensive Cognac benchmark is built from WordNet and Wikidata, paired with diverse NL instructions, to evaluate instruction conformance and fluency. CognacGen combines a generation model with guidance models (binary verifier, top-k, textual examples) and a trie-based decoding scheme, plus a self-guided distillation step via prefix-tuning to handle unseen instructions. Empirical results show CognacGen outperforms baselines on constrained generation tasks and remains competitive against large models like GPT-3, while requiring only prefix-tuning to adapt to new instructions. The work highlights both substantial progress in knowledge-guided generation and remaining gaps related to knowledge coverage and potential biases in KBs.

Abstract

We consider the task of text generation in language models with constraints specified in natural language. To this end, we first create a challenging benchmark Cognac that provides as input to the model a topic with example text, along with a constraint on text to be avoided. Unlike prior work, our benchmark contains knowledge-intensive constraints sourced from databases like Wordnet and Wikidata, which allows for straightforward evaluation while striking a balance between broad attribute-level and narrow lexical-level controls. We find that even state-of-the-art language models like GPT-3 fail often on this task, and propose a solution to leverage a language model's own internal knowledge to guide generation. Our method, called CognacGen, first queries the language model to generate guidance terms for a specified topic or constraint, and uses the guidance to modify the model's token generation probabilities. We propose three forms of guidance (binary verifier, top-k tokens, textual example), and employ prefix-tuning approaches to distill the guidance to tackle diverse natural language constraints. Through extensive empirical evaluations, we demonstrate that CognacGen can successfully generalize to unseen instructions and outperform competitive baselines in generating constraint conforming text.

Controllable Text Generation with Language Constraints

TL;DR

Abstract

Paper Structure (42 sections, 3 equations, 4 figures, 10 tables, 1 algorithm)

This paper contains 42 sections, 3 equations, 4 figures, 10 tables, 1 algorithm.

Introduction
The Cognac Benchmark
Task Setup
Dataset Collection
WordNet.
Wikidata.
Diverse natural language instructions.
Evaluation Metrics
Instruction Conformance (IC).
Copy-BLEU.
Repetition (Rep-n).
Perplexity (PPL).
Method
Overview
Guided Generation
...and 27 more sections

Figures (4)

Figure 1: Constraining instructions and model generations. Green highlight specifies the topic to be covered. Red highlight specifies the constraint to conform to. GPT-3 generates continuation that mentioned a politician, thus violating the constraint. CognacGen generates continuation that satisfies both the topic requirement and the constraint.
Figure 2: The two stages of CognacGen with textual example as guidance. Stage 1: the LM generates a list of guidance examples from the queries that specify the topic and constraint. During self-guidance distillation, the topic and constraint prefixes are tuned using the guidance example as target and the instruction with demonstrations as input. Stage 2: The guidance model (blue LM & the tuned prefixes) generates guidance examples from the test instance. The guidance examples are used to construct trie trees for both the topic (green) and the constraint (red). The generation (blue) LM's next token probability is modified by the tries.
Figure 3: Data generation process for WordNet (left) and Wikidata (right). Note that in WordNet, the topic and constraint need not be connected.
Figure : CognacGen (Textual Example Guidance)

Controllable Text Generation with Language Constraints

TL;DR

Abstract

Controllable Text Generation with Language Constraints

Authors

TL;DR

Abstract

Table of Contents

Figures (4)