XATU: A Fine-grained Instruction-based Benchmark for Explainable Text Updates

Haopeng Zhang; Hayate Iso; Sairam Gurajada; Nikita Bhutani

XATU: A Fine-grained Instruction-based Benchmark for Explainable Text Updates

Haopeng Zhang, Hayate Iso, Sairam Gurajada, Nikita Bhutani

TL;DR

This paper introduces XATU, the first benchmark specifically designed for fine-grained instruction-based explainable text editing, which combines LLM-based annotation and human annotation to enhance interpretability and demonstrate the effectiveness of instruction tuning and the impact of underlying architecture across various editing tasks.

Abstract

Text editing is a crucial task of modifying text to better align with user intents. However, existing text editing benchmark datasets contain only coarse-grained instructions and lack explainability, thus resulting in outputs that deviate from the intended changes outlined in the gold reference. To comprehensively investigate the text editing capabilities of large language models (LLMs), this paper introduces XATU, the first benchmark specifically designed for fine-grained instruction-based explainable text editing. XATU considers finer-grained text editing tasks of varying difficulty (simplification, grammar check, fact-check, etc.), incorporating lexical, syntactic, semantic, and knowledge-intensive edit aspects. To enhance interpretability, we combine LLM-based annotation and human annotation, resulting in a benchmark that includes fine-grained instructions and gold-standard edit explanations. By evaluating existing LLMs against our benchmark, we demonstrate the effectiveness of instruction tuning and the impact of underlying architecture across various editing tasks. Furthermore, extensive experimentation reveals the significant role of explanations in fine-tuning language models for text editing tasks. The benchmark will be open-sourced to support reproduction and facilitate future research at~\url{https://github.com/megagonlabs/xatu}.

XATU: A Fine-grained Instruction-based Benchmark for Explainable Text Updates

TL;DR

Abstract

Paper Structure (26 sections, 1 equation, 6 figures, 6 tables)

This paper contains 26 sections, 1 equation, 6 figures, 6 tables.

Introduction
Related Work
Text Editing
Text Editing Datasets
The XATU Benchmark
Data Source
Grammar Error Correction
Simplification
Style Transfer
Information Update
Annotation Process
Challenges in Crowdsourcing
Candidate Generation by LLMs
Candidate Validation by Human
Benchmark Usage
...and 11 more sections

Figures (6)

Figure 1: Illustrated examples of coarse- and fine-grained instructions for text editing. LLMs can accurately perform text editing based on coarse-grained instructions, but may not meet the user's intention. In contrast, fine-grained instructions lead to accurate and user-intended text editing.
Figure 2: The instance format of the data in XATU benchmark. Data in blue (Input, Output, Reference) are extracted from the corresponding data sources, and data in green (Fine-grained instruction and explanation) are obtained from joint automatic and human annotations.
Figure 3: Illustrated example of adding HTML tags to the input and output to explicitly indicate the edited portion to LLMs.
Figure 4: Fine-tuning with fine-grained instructions (-fine) vs. coarse instructions (-c).
Figure 5: Boxplot comparing instruction-tuned LLMs (Flan-xx) vs. pre-trained counterparts with fine-grained (-fine) and coarse instructions (-c).
...and 1 more figures

XATU: A Fine-grained Instruction-based Benchmark for Explainable Text Updates

TL;DR

Abstract

XATU: A Fine-grained Instruction-based Benchmark for Explainable Text Updates

Authors

TL;DR

Abstract

Table of Contents

Figures (6)