Progressive Document-level Text Simplification via Large Language Models
Dengzhao Fang, Jipeng Qiang, Yi Zhu, Yunhao Yuan, Wei Li, Yan Liu
TL;DR
Long-document text simplification remains challenging due to preserving content and maintaining coherence. The paper proposes ProgDS, a hierarchical, three-level framework (discourse-, topic-, lexical-level) that enables multi-stage collaboration among LLMs and mitigates hallucinations with over-generation and filtering. Empirical results on Wiki-auto and Newsela show state-of-the-art performance compared to smaller models and direct prompts, with stronger gains on longer documents. The work advances practical long-form DS and lays groundwork for multilingual DS and applications to other long-document tasks.
Abstract
Research on text simplification has primarily focused on lexical and sentence-level changes. Long document-level simplification (DS) is still relatively unexplored. Large Language Models (LLMs), like ChatGPT, have excelled in many natural language processing tasks. However, their performance on DS tasks is unsatisfactory, as they often treat DS as merely document summarization. For the DS task, the generated long sequences not only must maintain consistency with the original document throughout, but complete moderate simplification operations encompassing discourses, sentences, and word-level simplifications. Human editors employ a hierarchical complexity simplification strategy to simplify documents. This study delves into simulating this strategy through the utilization of a multi-stage collaboration using LLMs. We propose a progressive simplification method (ProgDS) by hierarchically decomposing the task, including the discourse-level, topic-level, and lexical-level simplification. Experimental results demonstrate that ProgDS significantly outperforms existing smaller models or direct prompting with LLMs, advancing the state-of-the-art in the document simplification task.
