ChatGPT vs Human-authored Text: Insights into Controllable Text Summarization and Sentence Style Transfer

Dongqi Liu; Vera Demberg

ChatGPT vs Human-authored Text: Insights into Controllable Text Summarization and Sentence Style Transfer

Dongqi Liu, Vera Demberg

TL;DR

This work systematically examines ChatGPT’s ability in two controllable text-generation tasks: audience-specific summarization and sentence formality transfer, comparing outputs to human-authored text. Using zero-shot prompts on the ELIFE and GYAFC datasets, the study analyzes readability, content fidelity, and stylistic differences via metrics like ROUGE, SummaC, BLEU, and POS/dependency distributions, plus hallucination checks. Key findings show humans exhibit larger stylistic variation than ChatGPT, and ChatGPT’s outputs can deviate from source semantics and exhibit hallucinations, though prompt engineering and example-guided prompts can improve alignment somewhat. The work highlights practical implications for deploying LLMs in controllable writing tasks, underscoring reliability concerns and the value of guided prompts to offset gaps with human performance.

Abstract

Large-scale language models, like ChatGPT, have garnered significant media attention and stunned the public with their remarkable capacity for generating coherent text from short natural language prompts. In this paper, we aim to conduct a systematic inspection of ChatGPT's performance in two controllable generation tasks, with respect to ChatGPT's ability to adapt its output to different target audiences (expert vs. layman) and writing styles (formal vs. informal). Additionally, we evaluate the faithfulness of the generated text, and compare the model's performance with human-authored texts. Our findings indicate that the stylistic variations produced by humans are considerably larger than those demonstrated by ChatGPT, and the generated texts diverge from human samples in several characteristics, such as the distribution of word types. Moreover, we observe that ChatGPT sometimes incorporates factual errors or hallucinations when adapting the text to suit a specific style.

ChatGPT vs Human-authored Text: Insights into Controllable Text Summarization and Sentence Style Transfer

TL;DR

Abstract

Paper Structure (38 sections, 15 figures, 9 tables)

This paper contains 38 sections, 15 figures, 9 tables.

Introduction
Related Work
Controllable Text Summarization
Text Style Transfer
ChatGPT
Study on Controllable Summarization
Prompt Formulation
Experimental Setup
Dataset
Metrics
Results on Controllable Summarization
Effect of Prompt Formulation
Reading Difficulty Control
Comparison to Previous SOTA Model
Disparities in Summarization Behavior
...and 23 more sections

Figures (15)

Figure 1: Comparison of abstractiveness between ChatGPT and human-generated summaries
Figure 2: Summary consistency detection. L stands for layman, E for expert.
Figure 3: Absolute differences in POS tags distribution of ChatGPT and human-generated sentences: GYAFC - EM
Figure 4: Dependency arc entailment: GYAFC - EM. Data points$>$0.95$\approx$Accurate. To clarify discrepancies, cutoff point$=$0.95.
Figure 5: Absolute differences in dependency labels distribution of ChatGPT and human-generated formal style sentences: GYAFC - EM
...and 10 more figures

ChatGPT vs Human-authored Text: Insights into Controllable Text Summarization and Sentence Style Transfer

TL;DR

Abstract

ChatGPT vs Human-authored Text: Insights into Controllable Text Summarization and Sentence Style Transfer

Authors

TL;DR

Abstract

Table of Contents

Figures (15)