Table of Contents
Fetching ...

Analysing Zero-Shot Readability-Controlled Sentence Simplification

Abdullah Barayan, Jose Camacho-Collados, Fernando Alva-Manchego

TL;DR

This paper investigates zero-shot sentence-level RCTS using instruction-tuned LLMs to rewrite English sentences to target CEFR levels without parallel data. It systematically analyzes how contextual prompts (CEFR descriptors and examples) and model variety influence readability control and meaning preservation, evaluated on the CEFR-SP sentence dataset with both automatic metrics and human judgments. The findings show broad difficulty in achieving lower CEFR targets, uneven benefits from additional contextual information, and notable misalignment between automatic metrics and human perception of simplification quality, underscoring the need for specialized evaluation methods. The work highlights practical challenges in balancing readability and semantic fidelity in RCTS and suggests directions such as few-shot data and refined metrics to guide improvements in future systems.

Abstract

Readability-controlled text simplification (RCTS) rewrites texts to lower readability levels while preserving their meaning. RCTS models often depend on parallel corpora with readability annotations on both source and target sides. Such datasets are scarce and difficult to curate, especially at the sentence level. To reduce reliance on parallel data, we explore using instruction-tuned large language models for zero-shot RCTS. Through automatic and manual evaluations, we examine: (1) how different types of contextual information affect a model's ability to generate sentences with the desired readability, and (2) the trade-off between achieving target readability and preserving meaning. Results show that all tested models struggle to simplify sentences (especially to the lowest levels) due to models' limitations and characteristics of the source sentences that impede adequate rewriting. Our experiments also highlight the need for better automatic evaluation metrics tailored to RCTS, as standard ones often misinterpret common simplification operations, and inaccurately assess readability and meaning preservation.

Analysing Zero-Shot Readability-Controlled Sentence Simplification

TL;DR

This paper investigates zero-shot sentence-level RCTS using instruction-tuned LLMs to rewrite English sentences to target CEFR levels without parallel data. It systematically analyzes how contextual prompts (CEFR descriptors and examples) and model variety influence readability control and meaning preservation, evaluated on the CEFR-SP sentence dataset with both automatic metrics and human judgments. The findings show broad difficulty in achieving lower CEFR targets, uneven benefits from additional contextual information, and notable misalignment between automatic metrics and human perception of simplification quality, underscoring the need for specialized evaluation methods. The work highlights practical challenges in balancing readability and semantic fidelity in RCTS and suggests directions such as few-shot data and refined metrics to guide improvements in future systems.

Abstract

Readability-controlled text simplification (RCTS) rewrites texts to lower readability levels while preserving their meaning. RCTS models often depend on parallel corpora with readability annotations on both source and target sides. Such datasets are scarce and difficult to curate, especially at the sentence level. To reduce reliance on parallel data, we explore using instruction-tuned large language models for zero-shot RCTS. Through automatic and manual evaluations, we examine: (1) how different types of contextual information affect a model's ability to generate sentences with the desired readability, and (2) the trade-off between achieving target readability and preserving meaning. Results show that all tested models struggle to simplify sentences (especially to the lowest levels) due to models' limitations and characteristics of the source sentences that impede adequate rewriting. Our experiments also highlight the need for better automatic evaluation metrics tailored to RCTS, as standard ones often misinterpret common simplification operations, and inaccurately assess readability and meaning preservation.
Paper Structure (51 sections, 7 figures, 16 tables)

This paper contains 51 sections, 7 figures, 16 tables.

Figures (7)

  • Figure 1: Example of an RCTS model rewriting a text with CEFR level C2 to either A1 or B1 levels.
  • Figure 2: Heatmaps depicting RMSE scores of the best models based on source and target readability levels.
  • Figure 3: Line graphs comparing syntactic and lexical complexities of the generated simplifications from the top-performing models to the source and target level syntactic and lexical complexity.
  • Figure 4: Prompts provided to instruction-tuned LLMs.
  • Figure 5: Screenshot of the Human Readability Assessment
  • ...and 2 more figures