Dr Genre: Reinforcement Learning from Decoupled LLM Feedback for Generic Text Rewriting

Yufei Li; John Nham; Ganesh Jawahar; Lei Shu; David Uthus; Yun-Hsuan Sung; Chengrun Yang; Itai Rolnick; Yi Qiao; Cong Liu

Dr Genre: Reinforcement Learning from Decoupled LLM Feedback for Generic Text Rewriting

Yufei Li, John Nham, Ganesh Jawahar, Lei Shu, David Uthus, Yun-Hsuan Sung, Chengrun Yang, Itai Rolnick, Yi Qiao, Cong Liu

TL;DR

This work tackles the challenge of generic text rewriting by introducing Dr Genré, a decoupled-reward reinforcement learning framework that combines three rewrite objectives—factuality, style, and conversation—via task-specific reward weights. It builds ChatRewrite alongside LongFact and RewriteLM to form a broad benchmark for evaluation and demonstrates that dynamic, decoupled rewards yield higher-quality rewrites across multiple tasks, improving agreement, coherence, and conciseness. The approach leverages supervised fine-tuning on mixed data, LLM-based reward modeling, and PPO-based RL with a KL constraint to maintain fidelity to the reference policy. AutoRater evaluations and case studies indicate that Dr Genré can adapt alignment direction to task requirements and outperform single-reward baselines, highlighting its potential for robust, general-purpose text rewriting in real-world user scenarios.

Abstract

Generic text rewriting is a prevalent large language model (LLM) application that covers diverse real-world tasks, such as style transfer, fact correction, and email editing. These tasks vary in rewriting objectives (e.g., factual consistency vs. semantic preservation), making it challenging to develop a unified model that excels across all dimensions. Existing methods often specialize in either a single task or a specific objective, limiting their generalizability. In this work, we introduce a generic model proficient in factuality, stylistic, and conversational rewriting tasks. To simulate real-world user rewrite requests, we construct a conversational rewrite dataset, ChatRewrite, that presents ``natural''-sounding instructions, from raw emails using LLMs. Combined with other popular rewrite datasets, including LongFact for the factuality rewrite task and RewriteLM for the stylistic rewrite task, this forms a broad benchmark for training and evaluating generic rewrite models. To align with task-specific objectives, we propose Dr Genre, a Decoupled-reward learning framework for Generic rewriting, that utilizes objective-oriented reward models with a task-specific weighting. Evaluation shows that \approach delivers higher-quality rewrites across all targeted tasks, improving objectives including instruction following (agreement), internal consistency (coherence), and minimal unnecessary edits (conciseness).

Dr Genre: Reinforcement Learning from Decoupled LLM Feedback for Generic Text Rewriting

TL;DR

Abstract

Dr Genre: Reinforcement Learning from Decoupled LLM Feedback for Generic Text Rewriting

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)