Table of Contents
Fetching ...

Demonstrations Are All You Need: Advancing Offensive Content Paraphrasing using In-Context Learning

Anirudh Som, Karan Sikka, Helen Gent, Ajay Divakaran, Andreas Kathol, Dimitra Vergyri

TL;DR

This work tackles the challenge of paraphrasing offensive content to reduce harm while preserving original meaning. It leverages few-shot in-context learning (ICL) with large language models, exploring how the number and order of demonstrations, as well as the presence of explicit instructions, affect generation quality and toxicity. A key contribution is the Context-Aware Polite Paraphrase (CAPP) dataset, which pairs rude utterances with polite paraphrases within dialogue context, enabling realistic evaluation. Across three datasets and multiple models, the authors show that ICL-paraphrasers achieve generation quality comparable to supervised baselines while achieving substantially lower toxicity (up to ~76%), and that performance remains robust even with limited training data (as low as 10%). The results highlight practical implications for rapid deployment of safe paraphrasing systems in dialogue environments and demonstrate the potential of open-source LLMs to approach closed-model performance in this task.

Abstract

Paraphrasing of offensive content is a better alternative to content removal and helps improve civility in a communication environment. Supervised paraphrasers; however, rely heavily on large quantities of labelled data to help preserve meaning and intent. They also often retain a large portion of the offensiveness of the original content, which raises questions on their overall usability. In this paper we aim to assist practitioners in developing usable paraphrasers by exploring In-Context Learning (ICL) with large language models (LLMs), i.e., using a limited number of input-label demonstration pairs to guide the model in generating desired outputs for specific queries. Our study focuses on key factors such as - number and order of demonstrations, exclusion of prompt instruction, and reduction in measured toxicity. We perform principled evaluation on three datasets, including our proposed Context-Aware Polite Paraphrase (CAPP) dataset, comprising of dialogue-style rude utterances, polite paraphrases, and additional dialogue context. We evaluate our approach using four closed source and one open source LLM. Our results reveal that ICL is comparable to supervised methods in generation quality, while being qualitatively better by 25% on human evaluation and attaining lower toxicity by 76%. Also, ICL-based paraphrasers only show a slight reduction in performance even with just 10% training data.

Demonstrations Are All You Need: Advancing Offensive Content Paraphrasing using In-Context Learning

TL;DR

This work tackles the challenge of paraphrasing offensive content to reduce harm while preserving original meaning. It leverages few-shot in-context learning (ICL) with large language models, exploring how the number and order of demonstrations, as well as the presence of explicit instructions, affect generation quality and toxicity. A key contribution is the Context-Aware Polite Paraphrase (CAPP) dataset, which pairs rude utterances with polite paraphrases within dialogue context, enabling realistic evaluation. Across three datasets and multiple models, the authors show that ICL-paraphrasers achieve generation quality comparable to supervised baselines while achieving substantially lower toxicity (up to ~76%), and that performance remains robust even with limited training data (as low as 10%). The results highlight practical implications for rapid deployment of safe paraphrasing systems in dialogue environments and demonstrate the potential of open-source LLMs to approach closed-model performance in this task.

Abstract

Paraphrasing of offensive content is a better alternative to content removal and helps improve civility in a communication environment. Supervised paraphrasers; however, rely heavily on large quantities of labelled data to help preserve meaning and intent. They also often retain a large portion of the offensiveness of the original content, which raises questions on their overall usability. In this paper we aim to assist practitioners in developing usable paraphrasers by exploring In-Context Learning (ICL) with large language models (LLMs), i.e., using a limited number of input-label demonstration pairs to guide the model in generating desired outputs for specific queries. Our study focuses on key factors such as - number and order of demonstrations, exclusion of prompt instruction, and reduction in measured toxicity. We perform principled evaluation on three datasets, including our proposed Context-Aware Polite Paraphrase (CAPP) dataset, comprising of dialogue-style rude utterances, polite paraphrases, and additional dialogue context. We evaluate our approach using four closed source and one open source LLM. Our results reveal that ICL is comparable to supervised methods in generation quality, while being qualitatively better by 25% on human evaluation and attaining lower toxicity by 76%. Also, ICL-based paraphrasers only show a slight reduction in performance even with just 10% training data.
Paper Structure (23 sections, 10 figures, 8 tables)

This paper contains 23 sections, 10 figures, 8 tables.

Figures (10)

  • Figure 1: Influence of number and order of demonstrations, and instruction, on BLEU Score performance and measured Toxicity using the text-davinci-003 model. Comparison is done between BART, instruction-only prompting, and three In-Context Learning approaches. Numbers on the $x$-axis represent number of demonstrations used in the In-Context Learning framework. Note, measured Toxicity for BART in ParaDetox is 82, exceeding the set $y$-axis limit.
  • Figure 2: BLEU as a function of number of demos. Noticeable improvement in BLEU is observed in the beginning, with performance saturating after a certain number of demos.
  • Figure 3: BLEU as a function of order of demonstrations and type of instruction used in the prompt design. Demonstrations that are semantically more similar to the query sample show better performance than less semantically similar and randomly selected samples. Also, prompts that only include demonstrations (i.e.,No Instruction) show a BLEU score that is comparable to prompts that include instruction and demonstrations.
  • Figure 4: BLEU score and measured toxicity performance with different instructions but with the same set of demos. Instructions can either complement or work against the selected demos and accordingly affect the BLEU score. The No Instruction setting shows comparable BLEU to prompts that include both instructions and demos but result in paraphrases with higher toxicity. The dotted reference lines are used to indicate the range in BLEU score under the No Instruction setting.
  • Figure 5: Average Toxicity measured using the DetoxifyDetoxify. The orange dotted line serves as a reference for the Gold-Standard's Toxicity. U, GT, B-#, T-#, G-#, V-# along the $x$-axis refer to Utterance, Gold-Standard, Baseline methods, text-davinci-003, gpt-3.5-turbo, Vicuna-13b respectively. # in T-#, G-#, V-# indicate number of demonstrations used. Note, T-0, G-0, V-0 only contain an instruction in the prompt.
  • ...and 5 more figures