Hatred Stems from Ignorance! Distillation of the Persuasion Modes in Countering Conversational Hate Speech
Ghadi Alyahya, Abeer Aldayel
TL;DR
This work investigates how three coarse persuasion modes—reason, emotion, and credibility—manifest in counterspeech aimed at countering online hate across closed (multi-turn) and open (single-turn) conversations. By annotating two major datasets (DialogConan for closed dialogs and Albanyan for open posts) and comparing human versus machine-generated counterspeech (GPT-3.5 and Llama 2), the study shows that humans predominantly deploy reason, while machine-generated counterspeech emphasizes emotion, with reason linked to more supportive replies. The authors also explore the flow of replies to counterspeech and demonstrate that persuasion-mode cues can serve as an explainability proxy for hate-detection models, while highlighting topic- and interaction-type-dependent variations and data-contamination concerns in large language models. Overall, the findings suggest incorporating persuasion-mode signals into counterspeech modeling to improve interpretability and effectiveness in mitigating hate speech. These insights offer a path toward more nuanced, explainable, and potentially more effective counterspeech systems in real-world online settings.
Abstract
Examining the factors that the counterspeech uses are at the core of understanding the optimal methods for confronting hate speech online. Various studies have assessed the emotional base factors used in counter speech, such as emotional empathy, offensiveness, and hostility. To better understand the counterspeech used in conversations, this study distills persuasion modes into reason, emotion, and credibility and evaluates their use in two types of conversation interactions: closed (multi-turn) and open (single-turn) concerning racism, sexism, and religious bigotry. The evaluation covers the distinct behaviors seen with human-sourced as opposed to machine-generated counterspeech. It also assesses the interplay between the stance taken and the mode of persuasion seen in the counterspeech. Notably, we observe nuanced differences in the counterspeech persuasion modes used in open and closed interactions, especially in terms of the topic, with a general tendency to use reason as a persuasion mode to express the counterpoint to hate comments. The machine-generated counterspeech tends to exhibit an emotional persuasion mode, while human counters lean toward reason. Furthermore, our study shows that reason tends to obtain more supportive replies than other persuasion modes. The findings highlight the potential for incorporating persuasion modes into studies about countering hate speech, as they can serve as an optimal means of explainability and pave the way for the further adoption of the reply's stance and the role it plays in assessing what comprises the optimal counterspeech.
