Task-Oriented Paraphrase Analytics

Marcel Gohsen; Matthias Hagen; Martin Potthast; Benno Stein

Task-Oriented Paraphrase Analytics

Marcel Gohsen, Matthias Hagen, Martin Potthast, Benno Stein

TL;DR

It is found that the distributions of task-specific instances in the known paraphrase corpora vary substantially, which means that the use of these corpora, without the respective paraphrase conditions being clearly defined, must lead to incomparable and misleading results.

Abstract

Since paraphrasing is an ill-defined task, the term "paraphrasing" covers text transformation tasks with different characteristics. Consequently, existing paraphrasing studies have applied quite different (explicit and implicit) criteria as to when a pair of texts is to be considered a paraphrase, all of which amount to postulating a certain level of semantic or lexical similarity. In this paper, we conduct a literature review and propose a taxonomy to organize the 25~identified paraphrasing (sub-)tasks. Using classifiers trained to identify the tasks that a given paraphrasing instance fits, we find that the distributions of task-specific instances in the known paraphrase corpora vary substantially. This means that the use of these corpora, without the respective paraphrase conditions being clearly defined (which is the normal case), must lead to incomparable and misleading results.

Task-Oriented Paraphrase Analytics

TL;DR

Abstract

Paper Structure (44 sections, 4 figures, 1 table)

This paper contains 44 sections, 4 figures, 1 table.

Introduction
Related Work
Paraphrase Definition
Paraphrase Typology
Paraphrase Generation
Paraphrase Corpora
Paraphrasing Task Taxonomy
Semantically Equivalent Paraphrasing
Copy Editing
Improvement of Coherence
Text Simplification
Sentence Compression and Expansion
Data Augmentation
Adversarial Example Generation
Linguistic Steganography
...and 29 more sections

Figures (4)

Figure 1: Taxonomy of paraphrase generation tasks which require generated paraphrases to be semantically equivalent to the original text.
Figure 2: Taxonomy of paraphrase generation tasks which allow generated paraphrases to be semantically similar and not necessarily identical to the original text.
Figure 3: Confusion matrix of manually annotated tasks and the actual tasks of paraphrases from task-specific corpora.
Figure 4: Confusion matrix of the automatically predicted paraphrasing tasks and the actual tasks of paraphrases from task-specific corpora in the sampled test-set.

Task-Oriented Paraphrase Analytics

TL;DR

Abstract

Task-Oriented Paraphrase Analytics

Authors

TL;DR

Abstract

Table of Contents

Figures (4)