Policy-based Sentence Simplification: Replacing Parallel Corpora with LLM-as-a-Judge

Xuanxin Wu; Yuki Arase; Masaaki Nagata

Policy-based Sentence Simplification: Replacing Parallel Corpora with LLM-as-a-Judge

Xuanxin Wu, Yuki Arase, Masaaki Nagata

TL;DR

This work introduces a policy-driven framework for sentence simplification that uses an LLM-as-a-Judge to generate policy-aligned preference data, eliminating the need for costly parallel corpora. By training with Adaptive Rejection Preference Optimization (ARPO) on data produced from multiple LLMs, the approach yields policy-aligned outputs for two edit policies: lexical-paraphrasing and overall-rewriting. The method enables small open-source LLMs to surpass GPT-4o on lexical tasks and reach comparable performance on overall rewriting, with strong human agreement and robust out-of-domain transfer. This offers a scalable, controllable pathway for tailoring text simplification to diverse audiences and applications, including education and accessibility tools.

Abstract

Sentence simplification aims to modify a sentence to make it easier to read and understand while preserving the meaning. Different applications require distinct simplification policies, such as replacing only complex words at the lexical level or rewriting the entire sentence while trading off details for simplicity. However, achieving such policy-driven control remains an open challenge. In this work, we introduce a simple yet powerful approach that leverages Large Language Model-as-a-Judge (LLM-as-a-Judge) to automatically construct policy-aligned training data, completely removing the need for costly human annotation or parallel corpora. Our method enables building simplification systems that adapt to diverse simplification policies. Remarkably, even small-scale open-source LLMs such as Phi-3-mini-3.8B surpass GPT-4o on lexical-oriented simplification, while achieving comparable performance on overall rewriting, as verified by both automatic metrics and human evaluations. The consistent improvements across model families and sizes demonstrate the robustness of our approach.

Policy-based Sentence Simplification: Replacing Parallel Corpora with LLM-as-a-Judge

TL;DR

Abstract

Policy-based Sentence Simplification: Replacing Parallel Corpora with LLM-as-a-Judge

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (16)