Table of Contents
Fetching ...

InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification

Jan Trienes, Sebastian Joseph, Jörg Schlötterer, Christin Seifert, Kyle Lo, Wei Xu, Byron C. Wallace, Junyi Jessy Li

TL;DR

This paper presents InfoLossQA, a framework to characterize and recover simplification-induced information loss through reader-centric QA pairs grounded in the Question Under Discussion theory. It builds a linguist-curated dataset of 1,000 QA pairs from 104 RCT abstracts simplified by GPT-4, showing information loss is frequent and that QA pairs can summarize what was elided. Two automatic methods are developed: end-to-end prompting of open-source/commercial LLMs and a natural language inference (NLI) pipeline, each with grounding for localization. An evaluation framework combining correctness, linguistic suitability, and recall, validated by expert judgments, reveals that while models can generate valid QAs, they struggle to reliably identify information loss and align with human judgments; the NLI approach provides higher recall but with coarser granularity, pointing to avenues for refinement and better interactive simplification tools.

Abstract

Text simplification aims to make technical texts more accessible to laypeople but often results in deletion of information and vagueness. This work proposes InfoLossQA, a framework to characterize and recover simplification-induced information loss in form of question-and-answer (QA) pairs. Building on the theory of Question Under Discussion, the QA pairs are designed to help readers deepen their knowledge of a text. We conduct a range of experiments with this framework. First, we collect a dataset of 1,000 linguist-curated QA pairs derived from 104 LLM simplifications of scientific abstracts of medical studies. Our analyses of this data reveal that information loss occurs frequently, and that the QA pairs give a high-level overview of what information was lost. Second, we devise two methods for this task: end-to-end prompting of open-source and commercial language models, and a natural language inference pipeline. With a novel evaluation framework considering the correctness of QA pairs and their linguistic suitability, our expert evaluation reveals that models struggle to reliably identify information loss and applying similar standards as humans at what constitutes information loss.

InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification

TL;DR

This paper presents InfoLossQA, a framework to characterize and recover simplification-induced information loss through reader-centric QA pairs grounded in the Question Under Discussion theory. It builds a linguist-curated dataset of 1,000 QA pairs from 104 RCT abstracts simplified by GPT-4, showing information loss is frequent and that QA pairs can summarize what was elided. Two automatic methods are developed: end-to-end prompting of open-source/commercial LLMs and a natural language inference (NLI) pipeline, each with grounding for localization. An evaluation framework combining correctness, linguistic suitability, and recall, validated by expert judgments, reveals that while models can generate valid QAs, they struggle to reliably identify information loss and align with human judgments; the NLI approach provides higher recall but with coarser granularity, pointing to avenues for refinement and better interactive simplification tools.

Abstract

Text simplification aims to make technical texts more accessible to laypeople but often results in deletion of information and vagueness. This work proposes InfoLossQA, a framework to characterize and recover simplification-induced information loss in form of question-and-answer (QA) pairs. Building on the theory of Question Under Discussion, the QA pairs are designed to help readers deepen their knowledge of a text. We conduct a range of experiments with this framework. First, we collect a dataset of 1,000 linguist-curated QA pairs derived from 104 LLM simplifications of scientific abstracts of medical studies. Our analyses of this data reveal that information loss occurs frequently, and that the QA pairs give a high-level overview of what information was lost. Second, we devise two methods for this task: end-to-end prompting of open-source and commercial language models, and a natural language inference pipeline. With a novel evaluation framework considering the correctness of QA pairs and their linguistic suitability, our expert evaluation reveals that models struggle to reliably identify information loss and applying similar standards as humans at what constitutes information loss.
Paper Structure (64 sections, 18 figures, 9 tables)

This paper contains 64 sections, 18 figures, 9 tables.

Figures (18)

  • Figure 1: The goal of InfoLossQA is to generate a series of QA pairs that reveal to lay readers what information a simplified text lacks compared to its original.
  • Figure 2: Example with a Deletion ("acute/chronic") and an Oversimplification ("improve arm function" is too broad given that EMS improves "artery function"). These give rise to two QA pairs ($Q_1$ and $Q_2$) which fulfill the Readability and Givenness constraints. For contrast, $Q_1'$ violates (⚡) givenness. $Q_1$ is likely more natural to lay readers because it could be asked without having seen the original text (no presupposition that the study looked at short-term and long-term effects).
  • Figure 3: Distribution of information loss. Humans produce a similar distribution of questions by section (a), but the questions differ in their localization (c). A similar localization results in more similar questions (b). Comparing humans to models, we see differences where questions are localized, and by extension also what they are about.
  • Figure 4: Qualitative examples demonstrating error cases. More examples in \ref{['fig:examples-continued']}.
  • Figure A.1: Qualitative examples demonstrating error cases. Continued from \ref{['fig:examples']}.
  • ...and 13 more figures