Hatred Stems from Ignorance! Distillation of the Persuasion Modes in Countering Conversational Hate Speech

Ghadi Alyahya; Abeer Aldayel

Hatred Stems from Ignorance! Distillation of the Persuasion Modes in Countering Conversational Hate Speech

Ghadi Alyahya, Abeer Aldayel

TL;DR

This work investigates how three coarse persuasion modes—reason, emotion, and credibility—manifest in counterspeech aimed at countering online hate across closed (multi-turn) and open (single-turn) conversations. By annotating two major datasets (DialogConan for closed dialogs and Albanyan for open posts) and comparing human versus machine-generated counterspeech (GPT-3.5 and Llama 2), the study shows that humans predominantly deploy reason, while machine-generated counterspeech emphasizes emotion, with reason linked to more supportive replies. The authors also explore the flow of replies to counterspeech and demonstrate that persuasion-mode cues can serve as an explainability proxy for hate-detection models, while highlighting topic- and interaction-type-dependent variations and data-contamination concerns in large language models. Overall, the findings suggest incorporating persuasion-mode signals into counterspeech modeling to improve interpretability and effectiveness in mitigating hate speech. These insights offer a path toward more nuanced, explainable, and potentially more effective counterspeech systems in real-world online settings.

Abstract

Examining the factors that the counterspeech uses are at the core of understanding the optimal methods for confronting hate speech online. Various studies have assessed the emotional base factors used in counter speech, such as emotional empathy, offensiveness, and hostility. To better understand the counterspeech used in conversations, this study distills persuasion modes into reason, emotion, and credibility and evaluates their use in two types of conversation interactions: closed (multi-turn) and open (single-turn) concerning racism, sexism, and religious bigotry. The evaluation covers the distinct behaviors seen with human-sourced as opposed to machine-generated counterspeech. It also assesses the interplay between the stance taken and the mode of persuasion seen in the counterspeech. Notably, we observe nuanced differences in the counterspeech persuasion modes used in open and closed interactions, especially in terms of the topic, with a general tendency to use reason as a persuasion mode to express the counterpoint to hate comments. The machine-generated counterspeech tends to exhibit an emotional persuasion mode, while human counters lean toward reason. Furthermore, our study shows that reason tends to obtain more supportive replies than other persuasion modes. The findings highlight the potential for incorporating persuasion modes into studies about countering hate speech, as they can serve as an optimal means of explainability and pave the way for the further adoption of the reply's stance and the role it plays in assessing what comprises the optimal counterspeech.

Hatred Stems from Ignorance! Distillation of the Persuasion Modes in Countering Conversational Hate Speech

TL;DR

Abstract

Paper Structure (24 sections, 6 figures, 7 tables)

This paper contains 24 sections, 6 figures, 7 tables.

Introduction
Previous Work
Counterspeech Aspects
Content Dynamics and Interaction Types
Persuasion Techniques in Conversations
Experimental Setup
Counterspeech Baseline Datasets
Persuasion Mode Labeling
Classifying Hate Speech and Counterspeech Using Persuasion Modes
Counterspeech Generation
Machine-Generated Counterspeech Labeling
Validation of Counter Generative Persuasion Labels
Results and Analysis
Interplay of Persuasion Modes in Hate Speech and Counterspeech in Each Type of Conversation Interaction, RQ1
Persuasion mode variations expressed in generated and human counterspeech, RQ2
...and 9 more sections

Figures (6)

Figure 1: Distribution of persuasion modes (in %) in counterspeech across overall and topic-specific contexts, comparing human, GPT-3, and LLaMA 2 responses in multi-turn (closed) and single-turn (open) interactions.
Figure 2: Entities identified by Riveter and power scores of personas across multi-turn (mT) and single-turn (1T) conversations for three topics (racism, sexism, and religious bigotry). A positive value indicates a stronger association with the persuasion mode. Positive, Negative.
Figure 3: Flow of persuasion modes between hate speech (HS) and counternarratives (CNs) in single-turn conversations (open interactions).
Figure 4: Correlation between the persuasion modes and the reply's stance.
Figure 5: Annotation example for persuasion modes, where all the turns are shown to facilitate the labeling of each turn of the conversation
...and 1 more figures

Hatred Stems from Ignorance! Distillation of the Persuasion Modes in Countering Conversational Hate Speech

TL;DR

Abstract

Hatred Stems from Ignorance! Distillation of the Persuasion Modes in Countering Conversational Hate Speech

Authors

TL;DR

Abstract

Table of Contents

Figures (6)