Table of Contents
Fetching ...

Language Model Sentence Completion with a Parser-Driven Rhetorical Control Method

Joshua Zingale, Jugal Kalita

TL;DR

This paper addresses controlling large language model text generation by enforcing predefined rhetorical relations between spans using a plug-in, parser-driven decoding approach that does not require fine tuning. It combines BLOOM 1.7B as the generation model with the DMRST RST parser to re-rank candidate next tokens according to how well they satisfy a target relation, using a top-p nucleus sampling strategy and a balancing parameter to blend model likelihood with parser guidance. Automatic and human evaluations demonstrate strong adherence to the desired relations with minimal degradation in perplexity, fluency, or reasonableness, including cross-language results for Spanish. The work enables RST guided, downstream RST tree generation in a practical plug-and-play fashion, offering a principled method to steer discourse structure in LLM outputs with modest computational overhead.

Abstract

Controlled text generation (CTG) seeks to guide large language model (LLM) output to produce text that conforms to desired criteria. The current study presents a novel CTG algorithm that enforces adherence toward specific rhetorical relations in an LLM sentence-completion context by a parser-driven decoding scheme that requires no model fine-tuning. The method is validated both with automatic and human evaluation. The code is accessible on GitHub.

Language Model Sentence Completion with a Parser-Driven Rhetorical Control Method

TL;DR

This paper addresses controlling large language model text generation by enforcing predefined rhetorical relations between spans using a plug-in, parser-driven decoding approach that does not require fine tuning. It combines BLOOM 1.7B as the generation model with the DMRST RST parser to re-rank candidate next tokens according to how well they satisfy a target relation, using a top-p nucleus sampling strategy and a balancing parameter to blend model likelihood with parser guidance. Automatic and human evaluations demonstrate strong adherence to the desired relations with minimal degradation in perplexity, fluency, or reasonableness, including cross-language results for Spanish. The work enables RST guided, downstream RST tree generation in a practical plug-and-play fashion, offering a principled method to steer discourse structure in LLM outputs with modest computational overhead.

Abstract

Controlled text generation (CTG) seeks to guide large language model (LLM) output to produce text that conforms to desired criteria. The current study presents a novel CTG algorithm that enforces adherence toward specific rhetorical relations in an LLM sentence-completion context by a parser-driven decoding scheme that requires no model fine-tuning. The method is validated both with automatic and human evaluation. The code is accessible on GitHub.
Paper Structure (17 sections, 6 equations, 3 figures, 3 tables)

This paper contains 17 sections, 6 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Relation-influenced completions for the sentence, "He came to my house,". The proposed method generates such completions.
  • Figure 2: The generation pipeline. Given the top-$p$ nucleus vocabulary of the distribution from the LLM, the parser re-ranks the tokens according to which tokens better fit the desired relation.
  • Figure 3: At each step of generation, the average difference between the highest and the lowest DMRST parser-assigned score in the nucleus vocabulary across $560$ generations using seven different relations.