Table of Contents
Fetching ...

PEACE 2.0: Grounded Explanations and Counter-Speech for Combating Hate Expressions

Greta Damo, Stéphane Petiot, Elena Cabrio, Serena Villata

TL;DR

PEACE 2.0 tackles the dual challenge of hate speech detection and actionable response by grounding explanations and counter-speech in a curated knowledge base via a Retrieval-Augmented Generation pipeline. It extends the prior PEACE tool with three core capabilities: knowledge-grounded CS generation, evidence-grounded explanations for HS predictions, and visual analytics for counter-speech analysis. The system uses a BERT-based detector, BGE-M3 embeddings, FAISS retrieval, and a 32,792-document knowledge base from UN, Eur-Lex, and FRA, enabling retrieval-conditioned outputs. Experimental results from both human and automatic evaluations show that RAG-grounded outputs are more informative, persuasive, and faithful to retrieved evidence across explicit and implicit hate speech, supporting their use in e-democracy and online moderation.

Abstract

The increasing volume of hate speech on online platforms poses significant societal challenges. While the Natural Language Processing community has developed effective methods to automatically detect the presence of hate speech, responses to it, called counter-speech, are still an open challenge. We present PEACE 2.0, a novel tool that, besides analysing and explaining why a message is considered hateful or not, also generates a response to it. More specifically, PEACE 2.0 has three main new functionalities: leveraging a Retrieval-Augmented Generation (RAG) pipeline i) to ground HS explanations into evidence and facts, ii) to automatically generate evidence-grounded counter-speech, and iii) exploring the characteristics of counter-speech replies. By integrating these capabilities, PEACE 2.0 enables in-depth analysis and response generation for both explicit and implicit hateful messages.

PEACE 2.0: Grounded Explanations and Counter-Speech for Combating Hate Expressions

TL;DR

PEACE 2.0 tackles the dual challenge of hate speech detection and actionable response by grounding explanations and counter-speech in a curated knowledge base via a Retrieval-Augmented Generation pipeline. It extends the prior PEACE tool with three core capabilities: knowledge-grounded CS generation, evidence-grounded explanations for HS predictions, and visual analytics for counter-speech analysis. The system uses a BERT-based detector, BGE-M3 embeddings, FAISS retrieval, and a 32,792-document knowledge base from UN, Eur-Lex, and FRA, enabling retrieval-conditioned outputs. Experimental results from both human and automatic evaluations show that RAG-grounded outputs are more informative, persuasive, and faithful to retrieved evidence across explicit and implicit hate speech, supporting their use in e-democracy and online moderation.

Abstract

The increasing volume of hate speech on online platforms poses significant societal challenges. While the Natural Language Processing community has developed effective methods to automatically detect the presence of hate speech, responses to it, called counter-speech, are still an open challenge. We present PEACE 2.0, a novel tool that, besides analysing and explaining why a message is considered hateful or not, also generates a response to it. More specifically, PEACE 2.0 has three main new functionalities: leveraging a Retrieval-Augmented Generation (RAG) pipeline i) to ground HS explanations into evidence and facts, ii) to automatically generate evidence-grounded counter-speech, and iii) exploring the characteristics of counter-speech replies. By integrating these capabilities, PEACE 2.0 enables in-depth analysis and response generation for both explicit and implicit hateful messages.
Paper Structure (10 sections, 2 tables)