Table of Contents
Fetching ...

ReZG: Retrieval-Augmented Zero-Shot Counter Narrative Generation for Hate Speech

Shuyu Jiang, Wenyi Tang, Xingshu Chen, Rui Tang, Haizhou Wang, Wenxian Wang

TL;DR

ReZG tackles the challenge of generating high-specificity counter narratives for unseen hate speech targets by coupling a Retrieval-Knowledge module with an Energy-based Generator. It introduces a multi-dimensional SSF retrieval (Stance, Semantics, Fitness) to extract relevant counter-knowledge from an external CMV-based repository and uses differentiable constraints during decoding to preserve retrieved knowledge, counter HS, and maintain fluency in a zero-shot setting. Empirical results show ReZG outperforms strong supervised and zero-shot baselines on automatic and human evaluations, with notable gains in relevance and countering success rate, and strong generalization to unseen HS targets. The approach demonstrates the value of external knowledge and constrained decoding for knowledge-intensive text generation, offering a scalable solution to generate targeted CNs without extensive target-specific annotations.

Abstract

The proliferation of hate speech (HS) on social media poses a serious threat to societal security. Automatic counter narrative (CN) generation, as an active strategy for HS intervention, has garnered increasing attention in recent years. Existing methods for automatically generating CNs mainly rely on re-training or fine-tuning pre-trained language models (PLMs) on human-curated CN corpora. Unfortunately, the annotation speed of CN corpora cannot keep up with the growth of HS targets, while generating specific and effective CNs for unseen targets remains a significant challenge for the model. To tackle this issue, we propose Retrieval-Augmented Zero-shot Generation (ReZG) to generate CNs with high-specificity for unseen targets. Specifically, we propose a multi-dimensional hierarchical retrieval method that integrates stance, semantics, and fitness, extending the retrieval metric from single dimension to multiple dimensions suitable for the knowledge that refutes HS. Then, we implement an energy-based constrained decoding mechanism that enables PLMs to use differentiable knowledge preservation, countering, and fluency constraint functions instead of in-target CNs as control signals for generation, thereby achieving zero-shot CN generation. With the above techniques, ReZG can integrate external knowledge flexibly and improve the specificity of CNs. Experimental results show that ReZG exhibits stronger generalization capabilities and outperforms strong baselines with significant improvements of 2.0%+ in the relevance and 4.5%+ in the countering success rate metrics.

ReZG: Retrieval-Augmented Zero-Shot Counter Narrative Generation for Hate Speech

TL;DR

ReZG tackles the challenge of generating high-specificity counter narratives for unseen hate speech targets by coupling a Retrieval-Knowledge module with an Energy-based Generator. It introduces a multi-dimensional SSF retrieval (Stance, Semantics, Fitness) to extract relevant counter-knowledge from an external CMV-based repository and uses differentiable constraints during decoding to preserve retrieved knowledge, counter HS, and maintain fluency in a zero-shot setting. Empirical results show ReZG outperforms strong supervised and zero-shot baselines on automatic and human evaluations, with notable gains in relevance and countering success rate, and strong generalization to unseen HS targets. The approach demonstrates the value of external knowledge and constrained decoding for knowledge-intensive text generation, offering a scalable solution to generate targeted CNs without extensive target-specific annotations.

Abstract

The proliferation of hate speech (HS) on social media poses a serious threat to societal security. Automatic counter narrative (CN) generation, as an active strategy for HS intervention, has garnered increasing attention in recent years. Existing methods for automatically generating CNs mainly rely on re-training or fine-tuning pre-trained language models (PLMs) on human-curated CN corpora. Unfortunately, the annotation speed of CN corpora cannot keep up with the growth of HS targets, while generating specific and effective CNs for unseen targets remains a significant challenge for the model. To tackle this issue, we propose Retrieval-Augmented Zero-shot Generation (ReZG) to generate CNs with high-specificity for unseen targets. Specifically, we propose a multi-dimensional hierarchical retrieval method that integrates stance, semantics, and fitness, extending the retrieval metric from single dimension to multiple dimensions suitable for the knowledge that refutes HS. Then, we implement an energy-based constrained decoding mechanism that enables PLMs to use differentiable knowledge preservation, countering, and fluency constraint functions instead of in-target CNs as control signals for generation, thereby achieving zero-shot CN generation. With the above techniques, ReZG can integrate external knowledge flexibly and improve the specificity of CNs. Experimental results show that ReZG exhibits stronger generalization capabilities and outperforms strong baselines with significant improvements of 2.0%+ in the relevance and 4.5%+ in the countering success rate metrics.
Paper Structure (38 sections, 13 equations, 9 figures, 13 tables, 1 algorithm)

This paper contains 38 sections, 13 equations, 9 figures, 13 tables, 1 algorithm.

Figures (9)

  • Figure 1: ReZG method workflow. Blue words represent the retrieved counter-knowledge, and yellow words represent the refuted points of HS.
  • Figure 2: ReZG framework. $s$ denotes the sentence extracted from filtered counter-comments. $x$, $y_{*}$, and $\widetilde{y}$ represent HS, retrieved counter-knowledge, and the continuous representation of generated CNs, respectively. The $f_{\operatorname{sim}}$($\cdot$), $f_{\operatorname{cc}}$($\cdot$), and $f_{\overleftarrow{\overrightarrow{\mathrm{flu}}}}$($\cdot$) in the generator denote knowledge preservation, countering and fluency constraints, which are added to the energy function $E\left(\widetilde{y}\right)$.
  • Figure 3: An Illustration of information selection process in SSF Algorithm.
  • Figure 4: Relative SROC and relevance scores between in-domain and out-of-domain models. The abscissa represents the test domain of each target. Rel$_{OD}$ and Rel$_{ID}$ represent the relevance score of out-of-domain and in-domain models, respectively. SROC$_{OD}$ and SROC$_{ID}$ denote the SROC score of the in-domain and out-of-domain models, respectively.
  • Figure 5: Distribution of positive and negative samples in stance consistency and semantic overlap rate.
  • ...and 4 more figures