Table of Contents
Fetching ...

Quantifying the Risks of Tool-assisted Rephrasing to Linguistic Diversity

Mengying Wang, Andreas Spitz

TL;DR

This paper measures the semantic and vocabulary change enacted by the use of rephrasing tools on a multi-domain corpus of human-generated text to quantify the risk of language change when adopted by a large user base.

Abstract

Writing assistants and large language models see widespread use in the creation of text content. While their effectiveness for individual users has been evaluated in the literature, little is known about their proclivity to change language or reduce its richness when adopted by a large user base. In this paper, we take a first step towards quantifying this risk by measuring the semantic and vocabulary change enacted by the use of rephrasing tools on a multi-domain corpus of human-generated text.

Quantifying the Risks of Tool-assisted Rephrasing to Linguistic Diversity

TL;DR

This paper measures the semantic and vocabulary change enacted by the use of rephrasing tools on a multi-domain corpus of human-generated text to quantify the risk of language change when adopted by a large user base.

Abstract

Writing assistants and large language models see widespread use in the creation of text content. While their effectiveness for individual users has been evaluated in the literature, little is known about their proclivity to change language or reduce its richness when adopted by a large user base. In this paper, we take a first step towards quantifying this risk by measuring the semantic and vocabulary change enacted by the use of rephrasing tools on a multi-domain corpus of human-generated text.

Paper Structure

This paper contains 38 sections, 6 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Percentual difference in text length after rephrasing with writing assistants (green) and LLMs (purple). Error bars denote 99% confidence intervals.
  • Figure 2: Percentual changes in total vocabulary size between all input and rephrased texts of a given type.
  • Figure 3: Jaccard overlap of input text and rephrased texts for assistants (green) and LLMs (purple). Error bars denote 99% confidence intervals.
  • Figure 4: Percentual changes in conicity after rephrasing for WATs (green) and LLMs (purple). Error bars denote 99% confidence intervals.
  • Figure 5: Percentual changes in the volume of the complex hull between input and rephrased texts using BERT and GPT-2 embeddings for assistants (green) and LLMs (purple). Error bars denote 99% confidence intervals.