Table of Contents
Fetching ...

PANDA -- Paired Anti-hate Narratives Dataset from Asia: Using an LLM-as-a-Judge to Create the First Chinese Counterspeech Dataset

Michael Bennie, Demi Zhang, Bushi Xiao, Jing Cao, Chryseis Xinyi Liu, Jian Meng, Alayo Tripp

TL;DR

PANDA (Paired Anti-hate Narratives Dataset from Asia) addresses the shortage of Chinese counterspeech data by constructing a Mandarin HS–CS paired corpus using an LLM-as-Judge pipeline, simulated annealing, zero-shot CN generation, and round-robin evaluation, followed by manual validation. The work highlights significant mislabeling in existing HS datasets and reveals biases in automated evaluation, where AI-generated CS often outperforms human-edited responses in JudgeLM rankings. While achieving a first-of-its-kind East Asian HS–CS resource and releasing data under GPL, the study also exposes scalability and annotator-diversity challenges, suggesting improvements in evaluation alignment and broader language coverage. The PANDA dataset provides a foundation for future CS generation and evaluation in non-Eurocentric languages, enabling more culturally grounded and globally inclusive hate-speech interventions.

Abstract

Despite the global prevalence of Modern Standard Chinese language, counterspeech (CS) resources for Chinese remain virtually nonexistent. To address this gap in East Asian counterspeech research we introduce the a corpus of Modern Standard Mandarin counterspeech that focuses on combating hate speech in Mainland China. This paper proposes a novel approach of generating CS by using an LLM-as-a-Judge, simulated annealing, LLMs zero-shot CN generation and a round-robin algorithm. This is followed by manual verification for quality and contextual relevance. This paper details the methodology for creating effective counterspeech in Chinese and other non-Eurocentric languages, including unique cultural patterns of which groups are maligned and linguistic patterns in what kinds of discourse markers are programmatically marked as hate speech (HS). Analysis of the generated corpora, we provide strong evidence for the lack of open-source, properly labeled Chinese hate speech data and the limitations of using an LLM-as-Judge to score possible answers in Chinese. Moreover, the present corpus serves as the first East Asian language based CS corpus and provides an essential resource for future research on counterspeech generation and evaluation.

PANDA -- Paired Anti-hate Narratives Dataset from Asia: Using an LLM-as-a-Judge to Create the First Chinese Counterspeech Dataset

TL;DR

PANDA (Paired Anti-hate Narratives Dataset from Asia) addresses the shortage of Chinese counterspeech data by constructing a Mandarin HS–CS paired corpus using an LLM-as-Judge pipeline, simulated annealing, zero-shot CN generation, and round-robin evaluation, followed by manual validation. The work highlights significant mislabeling in existing HS datasets and reveals biases in automated evaluation, where AI-generated CS often outperforms human-edited responses in JudgeLM rankings. While achieving a first-of-its-kind East Asian HS–CS resource and releasing data under GPL, the study also exposes scalability and annotator-diversity challenges, suggesting improvements in evaluation alignment and broader language coverage. The PANDA dataset provides a foundation for future CS generation and evaluation in non-Eurocentric languages, enabling more culturally grounded and globally inclusive hate-speech interventions.

Abstract

Despite the global prevalence of Modern Standard Chinese language, counterspeech (CS) resources for Chinese remain virtually nonexistent. To address this gap in East Asian counterspeech research we introduce the a corpus of Modern Standard Mandarin counterspeech that focuses on combating hate speech in Mainland China. This paper proposes a novel approach of generating CS by using an LLM-as-a-Judge, simulated annealing, LLMs zero-shot CN generation and a round-robin algorithm. This is followed by manual verification for quality and contextual relevance. This paper details the methodology for creating effective counterspeech in Chinese and other non-Eurocentric languages, including unique cultural patterns of which groups are maligned and linguistic patterns in what kinds of discourse markers are programmatically marked as hate speech (HS). Analysis of the generated corpora, we provide strong evidence for the lack of open-source, properly labeled Chinese hate speech data and the limitations of using an LLM-as-Judge to score possible answers in Chinese. Moreover, the present corpus serves as the first East Asian language based CS corpus and provides an essential resource for future research on counterspeech generation and evaluation.
Paper Structure (24 sections, 4 figures, 4 tables)

This paper contains 24 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Proposed Data Processing Pipeline for Creating the Chinese Counterspeech Corpus. A$_1$ through A$_n$ refer to $n$ annotators that participated in this project.
  • Figure 2: The scoring heat-map based on different combinations of minimum hate-speech score (y) and minimum length of each string (x).
  • Figure 3: The distribution of human labeling on hate-speech that has already been processed. This was generated from the first 785 instances of collected data.
  • Figure 4: A histogram showing the ranking of human-preferred/written answers to AI generated answers. This was generated from the first 785 instances of collected data.