Table of Contents
Fetching ...

ParsCN: A Persian Dataset for Counter-Narrative Generation to Combat Online Hate Speech

Zahra Safdari Fesaghandis, Suman Kalyan Maity

Abstract

Online hate speech threatens online civility, particularly in low-resource and multilingual environments. Counter-narratives offer a promising solution by promoting constructive responses to hate speech. However, automatic counter-narrative generation is hindered by the lack of high-quality data for low-resource languages like Persian. To bridge this gap, we introduce ParsCN, the first and most comprehensive Persian counter-narrative dataset. Consisting of 1,100 hate speech and counter-narrative pairs, it provides fine-grained annotations across six target groups and six countering strategies, tailored to the socio-cultural context of Persian online discourse. We propose a novel, scalable multi-stage framework that integrates culturally-informed human annotation with few-shot LLM-augmented generation, guided by semantic retrieval and rigorous manual curation. This approach enables the creation of diverse, high-quality counter-narratives while significantly reducing annotation costs - establishing a replicable paradigm for other low-resource settings. Comprehensive human and automatic evaluations confirm the quality of the dataset and the effectiveness of the generated responses. Human-written counter-narratives achieved the highest scores for relevance (4.23), Effectiveness (4.21), fluency (4.92), and tone appropriateness (4.79), with GPT-4o and Claude closely following. Automatic evaluations show strong semantic alignment, high lexical diversity, and low toxicity across all sources. Finally, we conduct benchmark evaluations using mBART and PersianMind on a held-out test set. Results reveal that existing models struggle with fluency, cultural nuance, and safety - highlighting the need for Persian-specific resources like ParsCN. Our dataset serves as a foundational benchmark to advance research on Persian counter-narrative generation and foster safer, more inclusive digital spaces.

ParsCN: A Persian Dataset for Counter-Narrative Generation to Combat Online Hate Speech

Abstract

Online hate speech threatens online civility, particularly in low-resource and multilingual environments. Counter-narratives offer a promising solution by promoting constructive responses to hate speech. However, automatic counter-narrative generation is hindered by the lack of high-quality data for low-resource languages like Persian. To bridge this gap, we introduce ParsCN, the first and most comprehensive Persian counter-narrative dataset. Consisting of 1,100 hate speech and counter-narrative pairs, it provides fine-grained annotations across six target groups and six countering strategies, tailored to the socio-cultural context of Persian online discourse. We propose a novel, scalable multi-stage framework that integrates culturally-informed human annotation with few-shot LLM-augmented generation, guided by semantic retrieval and rigorous manual curation. This approach enables the creation of diverse, high-quality counter-narratives while significantly reducing annotation costs - establishing a replicable paradigm for other low-resource settings. Comprehensive human and automatic evaluations confirm the quality of the dataset and the effectiveness of the generated responses. Human-written counter-narratives achieved the highest scores for relevance (4.23), Effectiveness (4.21), fluency (4.92), and tone appropriateness (4.79), with GPT-4o and Claude closely following. Automatic evaluations show strong semantic alignment, high lexical diversity, and low toxicity across all sources. Finally, we conduct benchmark evaluations using mBART and PersianMind on a held-out test set. Results reveal that existing models struggle with fluency, cultural nuance, and safety - highlighting the need for Persian-specific resources like ParsCN. Our dataset serves as a foundational benchmark to advance research on Persian counter-narrative generation and foster safer, more inclusive digital spaces.

Paper Structure

This paper contains 43 sections, 3 figures, 10 tables.

Figures (3)

  • Figure 1: ParsCN Dataset Creation Pipeline
  • Figure 2: Examples of Hate Speech and Counter Narratives, Along with Their Target Groups and Counter Types, from ParsCN
  • Figure 3: Examples of Culturally Relevant Hate Speech and Counter-Narratives from MT-CONAN and HateXplain with English Translations