COT: A Generative Approach for Hate Speech Counter-Narratives via Contrastive Optimal Transport

Linhao Zhang; Li Jin; Guangluan Xu; Xiaoyu Li; Xian Sun

COT: A Generative Approach for Hate Speech Counter-Narratives via Contrastive Optimal Transport

Linhao Zhang, Li Jin, Guangluan Xu, Xiaoyu Li, Xian Sun

TL;DR

This paper tackles automatic generation of hate-speech counter-narratives by introducing Contrastive Optimal Transport (COT). It fuses a transformer-based encoder with an Optimal Transport Kernel to inject hatred-target information, a self-contrastive objective to mitigate model degeneration, and a target-oriented decoding strategy to promote domain relevance and diversity. Empirical results on CONAN and Reddit show COT outperforms strong baselines across multiple metrics and yields higher relevance with lower toxicity, reinforced by ablations and human evaluations. The approach advances controllable CN generation with explicit target interaction and diversified outputs, offering practical potential for real-world interventions while maintaining freedom of expression.

Abstract

Counter-narratives, which are direct responses consisting of non-aggressive fact-based arguments, have emerged as a highly effective approach to combat the proliferation of hate speech. Previous methodologies have primarily focused on fine-tuning and post-editing techniques to ensure the fluency of generated contents, while overlooking the critical aspects of individualization and relevance concerning the specific hatred targets, such as LGBT groups, immigrants, etc. This research paper introduces a novel framework based on contrastive optimal transport, which effectively addresses the challenges of maintaining target interaction and promoting diversification in generating counter-narratives. Firstly, an Optimal Transport Kernel (OTK) module is leveraged to incorporate hatred target information in the token representations, in which the comparison pairs are extracted between original and transported features. Secondly, a self-contrastive learning module is employed to address the issue of model degeneration. This module achieves this by generating an anisotropic distribution of token representations. Finally, a target-oriented search method is integrated as an improved decoding strategy to explicitly promote domain relevance and diversification in the inference process. This strategy modifies the model's confidence score by considering both token similarity and target relevance. Quantitative and qualitative experiments have been evaluated on two benchmark datasets, which demonstrate that our proposed model significantly outperforms current methods evaluated by metrics from multiple aspects.

COT: A Generative Approach for Hate Speech Counter-Narratives via Contrastive Optimal Transport

TL;DR

Abstract

Paper Structure (27 sections, 26 equations, 7 figures, 17 tables, 1 algorithm)

This paper contains 27 sections, 26 equations, 7 figures, 17 tables, 1 algorithm.

Introduction
Related Work
Hate Speech Countering
Counter-Narrative
Optimal Transport
Problem Definition
Proposed Methodology
Transformer-style Encoder
Contrastive Optimal Transport
Decoding Strategy
Optimization Process
Experiments
Datasets
Evaluation Metrics
Baselines
...and 12 more sections

Figures (7)

Figure 1: Examples of methods for combating hate, including traditional methods and counter-narratives intervention. The displayed counter-narratives are generated by baselines and our proposed COT model
Figure 2: The workflow of proposed COT: (1) The hatred target, hate speech sentence, and counter-narrative sentence are fed into transformer-style encoders, which utilize an embedding matrix for word vectorization and transformer layers with causal attention for interaction; (2) Forward the encoded representations into the two proposed module to calibrate the representation space, which is realized through the proposed contrastive objectives $\mathcal{L}_{T}$ and $\mathcal{L}_{C}$; (3) During training, decode representations into probabilities over the whole vocabulary to calculate standard MLE loss; during testing, decode the representations into natural words through the proposed strategy.
Figure 3: Hatred Target distributions in CONAN
Figure 4: Model performance with different penalty weights in the decoding method.
Figure 5: Embedding space visualization of COT with different combinations of training objectives.
...and 2 more figures

COT: A Generative Approach for Hate Speech Counter-Narratives via Contrastive Optimal Transport

TL;DR

Abstract

COT: A Generative Approach for Hate Speech Counter-Narratives via Contrastive Optimal Transport

Authors

TL;DR

Abstract

Table of Contents

Figures (7)