Table of Contents
Fetching ...

Progressive Document-level Text Simplification via Large Language Models

Dengzhao Fang, Jipeng Qiang, Yi Zhu, Yunhao Yuan, Wei Li, Yan Liu

TL;DR

Long-document text simplification remains challenging due to preserving content and maintaining coherence. The paper proposes ProgDS, a hierarchical, three-level framework (discourse-, topic-, lexical-level) that enables multi-stage collaboration among LLMs and mitigates hallucinations with over-generation and filtering. Empirical results on Wiki-auto and Newsela show state-of-the-art performance compared to smaller models and direct prompts, with stronger gains on longer documents. The work advances practical long-form DS and lays groundwork for multilingual DS and applications to other long-document tasks.

Abstract

Research on text simplification has primarily focused on lexical and sentence-level changes. Long document-level simplification (DS) is still relatively unexplored. Large Language Models (LLMs), like ChatGPT, have excelled in many natural language processing tasks. However, their performance on DS tasks is unsatisfactory, as they often treat DS as merely document summarization. For the DS task, the generated long sequences not only must maintain consistency with the original document throughout, but complete moderate simplification operations encompassing discourses, sentences, and word-level simplifications. Human editors employ a hierarchical complexity simplification strategy to simplify documents. This study delves into simulating this strategy through the utilization of a multi-stage collaboration using LLMs. We propose a progressive simplification method (ProgDS) by hierarchically decomposing the task, including the discourse-level, topic-level, and lexical-level simplification. Experimental results demonstrate that ProgDS significantly outperforms existing smaller models or direct prompting with LLMs, advancing the state-of-the-art in the document simplification task.

Progressive Document-level Text Simplification via Large Language Models

TL;DR

Long-document text simplification remains challenging due to preserving content and maintaining coherence. The paper proposes ProgDS, a hierarchical, three-level framework (discourse-, topic-, lexical-level) that enables multi-stage collaboration among LLMs and mitigates hallucinations with over-generation and filtering. Empirical results on Wiki-auto and Newsela show state-of-the-art performance compared to smaller models and direct prompts, with stronger gains on longer documents. The work advances practical long-form DS and lays groundwork for multilingual DS and applications to other long-document tasks.

Abstract

Research on text simplification has primarily focused on lexical and sentence-level changes. Long document-level simplification (DS) is still relatively unexplored. Large Language Models (LLMs), like ChatGPT, have excelled in many natural language processing tasks. However, their performance on DS tasks is unsatisfactory, as they often treat DS as merely document summarization. For the DS task, the generated long sequences not only must maintain consistency with the original document throughout, but complete moderate simplification operations encompassing discourses, sentences, and word-level simplifications. Human editors employ a hierarchical complexity simplification strategy to simplify documents. This study delves into simulating this strategy through the utilization of a multi-stage collaboration using LLMs. We propose a progressive simplification method (ProgDS) by hierarchically decomposing the task, including the discourse-level, topic-level, and lexical-level simplification. Experimental results demonstrate that ProgDS significantly outperforms existing smaller models or direct prompting with LLMs, advancing the state-of-the-art in the document simplification task.
Paper Structure (16 sections, 1 equation, 18 figures, 7 tables)

This paper contains 16 sections, 1 equation, 18 figures, 7 tables.

Figures (18)

  • Figure 1: The framework of Summary-enhanced simplification(SumDS) to generate the simpler version. After dividing the source document based on its content, it is simplified separately guided by the summary. The simplified segments are then concatenated to form the final output.
  • Figure 2: The prompting template of Summarizer of SumDS, where the contents within "[]" are variables.
  • Figure 3: The prompting template of Paragraph-simplifier of SumDS. "(Examples)" refers to the examples needed for few-shot learning and chain-of-thought.
  • Figure 4: The framework of progressive simplification (ProgDS). The three levels of discourse-level, topic-level, and lexical-level simplification are performed sequentially. Moreover, the topic-level and lexical-level simplification are executed multiple times within a document.
  • Figure 5: The prompting template of Discourse-simplifier in ProgDS.
  • ...and 13 more figures