Table of Contents
Fetching ...

Let's Simplify Step by Step: Guiding LLM Towards Multilingual Unsupervised Proficiency-Controlled Sentence Simplification

Jingshen Zhang, Xin Ying Qiu, Lifang Lu, Zhuhua Huang, Yutao Hu, Yuechang Wu, JunYu Lu

TL;DR

This paper addresses the challenge of CEFR-proficiency-controlled sentence simplification with large readability spans using a three-part framework: dynamic path planning to decompose large jumps, semantic-guided exemplar selection to preserve meaning, and few-shot chain-of-thought with chat history to maintain coherent reasoning. The approach yields up to 20 percentage-point improvements in target-level accuracy and 22–42% reductions in inference steps across five languages, validated by automatic metrics and human judgments. A key finding is the persistent trade-off between readability control and semantic fidelity, which becomes more pronounced with larger spans. The work demonstrates a viable, multilingual path toward more controllable simplification, while highlighting open challenges in preserving meaning during extensive level reductions.

Abstract

Large language models demonstrate limited capability in proficiency-controlled sentence simplification, particularly when simplifying across large readability levels. We propose a framework that decomposes complex simplifications into manageable steps through dynamic path planning, semantic-aware exemplar selection, and chain-of-thought generation with conversation history for coherent reasoning. Evaluation on five languages across two benchmarks shows our approach improves simplification effectiveness while reducing computational steps by 22-42%. Human evaluation confirms the fundamental trade-off between simplification effectiveness and meaning preservation. Notably, even human annotators struggle to agree on semantic preservation judgments, highlighting the inherent complexity of this task. Our work shows that while step-by-step simplification improves control, preserving semantic fidelity during extensive simplification remains an open challenge.

Let's Simplify Step by Step: Guiding LLM Towards Multilingual Unsupervised Proficiency-Controlled Sentence Simplification

TL;DR

This paper addresses the challenge of CEFR-proficiency-controlled sentence simplification with large readability spans using a three-part framework: dynamic path planning to decompose large jumps, semantic-guided exemplar selection to preserve meaning, and few-shot chain-of-thought with chat history to maintain coherent reasoning. The approach yields up to 20 percentage-point improvements in target-level accuracy and 22–42% reductions in inference steps across five languages, validated by automatic metrics and human judgments. A key finding is the persistent trade-off between readability control and semantic fidelity, which becomes more pronounced with larger spans. The work demonstrates a viable, multilingual path toward more controllable simplification, while highlighting open challenges in preserving meaning during extensive level reductions.

Abstract

Large language models demonstrate limited capability in proficiency-controlled sentence simplification, particularly when simplifying across large readability levels. We propose a framework that decomposes complex simplifications into manageable steps through dynamic path planning, semantic-aware exemplar selection, and chain-of-thought generation with conversation history for coherent reasoning. Evaluation on five languages across two benchmarks shows our approach improves simplification effectiveness while reducing computational steps by 22-42%. Human evaluation confirms the fundamental trade-off between simplification effectiveness and meaning preservation. Notably, even human annotators struggle to agree on semantic preservation judgments, highlighting the inherent complexity of this task. Our work shows that while step-by-step simplification improves control, preserving semantic fidelity during extensive simplification remains an open challenge.
Paper Structure (34 sections, 4 equations, 4 figures, 11 tables)

This paper contains 34 sections, 4 equations, 4 figures, 11 tables.

Figures (4)

  • Figure 1: Research Framework
  • Figure 2: Prompt Design for Few-shot CoT with Chat Template and Chat History
  • Figure 3: Screenshot of Human Meaning Preservation Judgement: Instruction
  • Figure 4: Screenshot of Human Meaning Preservation Judgement: Annotation