Table of Contents
Fetching ...

Contextualized Counterspeech: Strategies for Adaptation, Personalization, and Evaluation

Lorenzo Cima, Alessio Miaschi, Amaury Trujillo, Marco Avvenuti, Felice Dell'Orletta, Stefano Cresci

TL;DR

This work addresses online toxicity by advancing contextualized counterspeech that adapts to community context and moderated user characteristics, implemented with a 13B-parameter LLaMA model. It systematically evaluates 36 generation configurations across adaptation and personalization factors, using both algorithmic indicators and pre-registered human crowdsourcing to assess politeness, adequacy, relevance, diversity, truthfulness, and persuasiveness. Key findings show contextualized counterspeech can outperform generic baselines in adequacy and persuasiveness, but there is a notable divergence between automated metrics and human judgments, highlighting the need for nuanced, mixed evaluation approaches and increased human-AI collaboration in moderation. The results point to trade-offs among adaptation/personalization strategies and suggest larger LLMs and more sophisticated evaluation methodologies to fully realize the potential of contextualized counterspeech in real-world platforms.

Abstract

AI-generated counterspeech offers a promising and scalable strategy to curb online toxicity through direct replies that promote civil discourse. However, current counterspeech is one-size-fits-all, lacking adaptation to the moderation context and the users involved. We propose and evaluate multiple strategies for generating tailored counterspeech that is adapted to the moderation context and personalized for the moderated user. We instruct an LLaMA2-13B model to generate counterspeech, experimenting with various configurations based on different contextual information and fine-tuning strategies. We identify the configurations that generate persuasive counterspeech through a combination of quantitative indicators and human evaluations collected via a pre-registered mixed-design crowdsourcing experiment. Results show that contextualized counterspeech can significantly outperform state-of-the-art generic counterspeech in adequacy and persuasiveness, without compromising other characteristics. Our findings also reveal a poor correlation between quantitative indicators and human evaluations, suggesting that these methods assess different aspects and highlighting the need for nuanced evaluation methodologies. The effectiveness of contextualized AI-generated counterspeech and the divergence between human and algorithmic evaluations underscore the importance of increased human-AI collaboration in content moderation.

Contextualized Counterspeech: Strategies for Adaptation, Personalization, and Evaluation

TL;DR

This work addresses online toxicity by advancing contextualized counterspeech that adapts to community context and moderated user characteristics, implemented with a 13B-parameter LLaMA model. It systematically evaluates 36 generation configurations across adaptation and personalization factors, using both algorithmic indicators and pre-registered human crowdsourcing to assess politeness, adequacy, relevance, diversity, truthfulness, and persuasiveness. Key findings show contextualized counterspeech can outperform generic baselines in adequacy and persuasiveness, but there is a notable divergence between automated metrics and human judgments, highlighting the need for nuanced, mixed evaluation approaches and increased human-AI collaboration in moderation. The results point to trade-offs among adaptation/personalization strategies and suggest larger LLMs and more sophisticated evaluation methodologies to fully realize the potential of contextualized counterspeech in real-world platforms.

Abstract

AI-generated counterspeech offers a promising and scalable strategy to curb online toxicity through direct replies that promote civil discourse. However, current counterspeech is one-size-fits-all, lacking adaptation to the moderation context and the users involved. We propose and evaluate multiple strategies for generating tailored counterspeech that is adapted to the moderation context and personalized for the moderated user. We instruct an LLaMA2-13B model to generate counterspeech, experimenting with various configurations based on different contextual information and fine-tuning strategies. We identify the configurations that generate persuasive counterspeech through a combination of quantitative indicators and human evaluations collected via a pre-registered mixed-design crowdsourcing experiment. Results show that contextualized counterspeech can significantly outperform state-of-the-art generic counterspeech in adequacy and persuasiveness, without compromising other characteristics. Our findings also reveal a poor correlation between quantitative indicators and human evaluations, suggesting that these methods assess different aspects and highlighting the need for nuanced evaluation methodologies. The effectiveness of contextualized AI-generated counterspeech and the divergence between human and algorithmic evaluations underscore the importance of increased human-AI collaboration in content moderation.

Paper Structure

This paper contains 35 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Current AI-generated counterspeech only leverages the content of the toxic message. Here, we generate contextualized counterspeech that also leverages information about the community, the conversation, and the moderated user to craft more persuasive responses.
  • Figure 2: Algorithmic evaluation results for each factor. For each factor (y axis) and indicator (panels), the teal dot shows the mean value of the indicator when the factor is used in the evaluated configurations, while the sand dot indicates the mean value of the indicator when the factor is not used. Arrows specify whether larger $\uparrow$ or smaller $\downarrow$ scores are better.
  • Figure 3: Human evaluation results (non-contextual condition). Effect sizes and confidence intervals of the scores assigned to several configurations compared to the baseline. Statistical significance: ***: $p < 0.01$.
  • Figure 4: Human evaluation results (contextual condition). Effect sizes and confidence intervals of the scores assigned to several configurations compared to the baseline. Statistical significance: ***: $p < 0.01$, **: $p < 0.05$, *: $p < 0.1$.
  • Figure 5: Differences in human evaluation results between the contextual and non-contextual conditions. Statistical significance: ***: $p < 0.01$, **: $p < 0.05$, *: $p < 0.1$.
  • ...and 2 more figures