Table of Contents
Fetching ...

What's Mine becomes Yours: Defining, Annotating and Detecting Context-Dependent Paraphrases in News Interview Dialogs

Anna Wegmann, Tijs van den Broek, Dong Nguyen

TL;DR

This work defines and operationalizes context-dependent paraphrases in dialog, introducing ContextDeP—a dataset of 600 guest-host utterance pairs from NPR and CNN interviews with 5,581 annotations. It provides an annotation framework for identifying paraphrase spans across turns, examines label variation, and demonstrates promising results using both token-classification with DeBERTa and in-context learning with multiple LLMs for paraphrase detection in dialog. The study highlights the challenges of ground-truth in contextual paraphrase tasks and shows that GPT-4 excels at classification while DeBERTa-based token classifiers excel at span highlighting. By releasing data, code, and models, it enables future research on evaluating and improving dialog-centered paraphrase detection and its use in dialogue systems and social science analyses.

Abstract

Best practices for high conflict conversations like counseling or customer support almost always include recommendations to paraphrase the previous speaker. Although paraphrase classification has received widespread attention in NLP, paraphrases are usually considered independent from context, and common models and datasets are not applicable to dialog settings. In this work, we investigate paraphrases in dialog (e.g., Speaker 1: "That book is mine." becomes Speaker 2: "That book is yours."). We provide an operationalization of context-dependent paraphrases, and develop a training for crowd-workers to classify paraphrases in dialog. We introduce a dataset with utterance pairs from NPR and CNN news interviews annotated for context-dependent paraphrases. To enable analyses on label variation, the dataset contains 5,581 annotations on 600 utterance pairs. We present promising results with in-context learning and with token classification models for automatic paraphrase detection in dialog.

What's Mine becomes Yours: Defining, Annotating and Detecting Context-Dependent Paraphrases in News Interview Dialogs

TL;DR

This work defines and operationalizes context-dependent paraphrases in dialog, introducing ContextDeP—a dataset of 600 guest-host utterance pairs from NPR and CNN interviews with 5,581 annotations. It provides an annotation framework for identifying paraphrase spans across turns, examines label variation, and demonstrates promising results using both token-classification with DeBERTa and in-context learning with multiple LLMs for paraphrase detection in dialog. The study highlights the challenges of ground-truth in contextual paraphrase tasks and shows that GPT-4 excels at classification while DeBERTa-based token classifiers excel at span highlighting. By releasing data, code, and models, it enables future research on evaluating and improving dialog-centered paraphrase detection and its use in dialogue systems and social science analyses.

Abstract

Best practices for high conflict conversations like counseling or customer support almost always include recommendations to paraphrase the previous speaker. Although paraphrase classification has received widespread attention in NLP, paraphrases are usually considered independent from context, and common models and datasets are not applicable to dialog settings. In this work, we investigate paraphrases in dialog (e.g., Speaker 1: "That book is mine." becomes Speaker 2: "That book is yours."). We provide an operationalization of context-dependent paraphrases, and develop a training for crowd-workers to classify paraphrases in dialog. We introduce a dataset with utterance pairs from NPR and CNN news interviews annotated for context-dependent paraphrases. To enable analyses on label variation, the dataset contains 5,581 annotations on 600 utterance pairs. We present promising results with in-context learning and with token classification models for automatic paraphrase detection in dialog.
Paper Structure (39 sections, 17 figures, 18 tables)

This paper contains 39 sections, 17 figures, 18 tables.

Figures (17)

  • Figure 1: Context-Dependent Paraphrase in a News Interview. The interview host paraphrases part of the guest's utterance. It is only a paraphrase in the current context (e.g., doing something 20 times and doing something for a while are not generally synonymous). Our annotators provide word-level highlighting. The color's intensity shows the share of annotators that selected the word. Here, most annotators selected the same text spans, some included "from Rome" as part of what is paraphrased by the host. We underline the paraphrase identified by our fine-tuned DeBERTa token classifier.
  • Figure 2: Label distribution after first author annotations performed in two batches. First author label classification was performed in two batches. The first batch consists of 750 text pairs, the second of 3,700.
  • Figure 3: Distribution of Labels by Lead Author. We display the estimated number of (non-)paraphrases from the lead author annotations for the random subsample (RANDOM), the BALANCED sample and the wider paraphrase variety sample (PARA). Note, RANDOM consists of 100 elements, however only 98 are included in this statistic here (leading to numbers like 6.1). 2 pairs were not classified by the lead author because they were too ambiguous or were missing context information to reach a decision. We exclude such pairs in all other samples.
  • Figure 4: Annotator Training (1). Definition Paraphrase
  • Figure 5: Annotator Training (2). Comprehension Check Paraphrase. Variations of the the shown highlighting are accepted.
  • ...and 12 more figures