Table of Contents
Fetching ...

CSEval: Towards Automated, Multi-Dimensional, and Reference-Free Counterspeech Evaluation using Auto-Calibrated LLMs

Amey Hengle, Aswini Kumar, Anil Bandhakavi, Tanmoy Chakraborty

TL;DR

The paper tackles the challenge of evaluating automated counterspeech by proposing CSEval, a large, expert-annotated, multidimensional dataset across four quality aspects, and Auto-CSEval, a prompt-based, auto-calibrated LLM evaluator for reference-free scoring. The approach demonstrates that traditional similarity metrics poorly approximate human judgments, while Auto-CSEval yields higher correlations with human ratings across relevance, aggressiveness, coherence, and suitableness, especially when using GPT-4 with calibration. Key contributions include the CSEval dataset (7,926 model-generated CS and 4,318 ground-truth CS across 2,223 HS), a detailed annotation protocol with inter-annotator agreement analysis, and a two-phase calibration procedure that optimizes evaluation CoTs against human judgments. The findings indicate substantial potential for scalable, human-aligned evaluation in counterspeech research, with practical implications for benchmarking and developing more effective automated counterspeech systems.

Abstract

Counterspeech has emerged as a popular and effective strategy for combating online hate speech, sparking growing research interest in automating its generation using language models. However, the field still lacks standardised evaluation protocols and reliable automated evaluation metrics that align with human judgement. Current automatic evaluation methods, primarily based on similarity metrics, do not effectively capture the complex and independent attributes of counterspeech quality, such as contextual relevance, aggressiveness, or argumentative coherence. This has led to an increased dependency on labor-intensive human evaluations to assess automated counter-speech generation methods. To address these challenges, we introduce CSEval, a novel dataset and framework for evaluating counterspeech quality across four dimensions: contextual-relevance, aggressiveness, argument-coherence, and suitableness. Furthermore, we propose Auto-Calibrated COT for Counterspeech Evaluation (Auto-CSEval), a prompt-based method with auto-calibrated chain-of-thoughts (CoT) for scoring counterspeech using large language models. Our experiments show that Auto-CSEval outperforms traditional metrics like ROUGE, METEOR, and BertScore in correlating with human judgement, indicating a significant improvement in automated counterspeech evaluation.

CSEval: Towards Automated, Multi-Dimensional, and Reference-Free Counterspeech Evaluation using Auto-Calibrated LLMs

TL;DR

The paper tackles the challenge of evaluating automated counterspeech by proposing CSEval, a large, expert-annotated, multidimensional dataset across four quality aspects, and Auto-CSEval, a prompt-based, auto-calibrated LLM evaluator for reference-free scoring. The approach demonstrates that traditional similarity metrics poorly approximate human judgments, while Auto-CSEval yields higher correlations with human ratings across relevance, aggressiveness, coherence, and suitableness, especially when using GPT-4 with calibration. Key contributions include the CSEval dataset (7,926 model-generated CS and 4,318 ground-truth CS across 2,223 HS), a detailed annotation protocol with inter-annotator agreement analysis, and a two-phase calibration procedure that optimizes evaluation CoTs against human judgments. The findings indicate substantial potential for scalable, human-aligned evaluation in counterspeech research, with practical implications for benchmarking and developing more effective automated counterspeech systems.

Abstract

Counterspeech has emerged as a popular and effective strategy for combating online hate speech, sparking growing research interest in automating its generation using language models. However, the field still lacks standardised evaluation protocols and reliable automated evaluation metrics that align with human judgement. Current automatic evaluation methods, primarily based on similarity metrics, do not effectively capture the complex and independent attributes of counterspeech quality, such as contextual relevance, aggressiveness, or argumentative coherence. This has led to an increased dependency on labor-intensive human evaluations to assess automated counter-speech generation methods. To address these challenges, we introduce CSEval, a novel dataset and framework for evaluating counterspeech quality across four dimensions: contextual-relevance, aggressiveness, argument-coherence, and suitableness. Furthermore, we propose Auto-Calibrated COT for Counterspeech Evaluation (Auto-CSEval), a prompt-based method with auto-calibrated chain-of-thoughts (CoT) for scoring counterspeech using large language models. Our experiments show that Auto-CSEval outperforms traditional metrics like ROUGE, METEOR, and BertScore in correlating with human judgement, indicating a significant improvement in automated counterspeech evaluation.

Paper Structure

This paper contains 31 sections, 3 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: An example comparing classical evaluation (ROUGE, METEOR, etc.) vs LLM-based multidimensional evaluation for two counterspeech, A and B. We observe that while B is more relevant and addresses the implied bias expressed in the hate speech, it is scored lower than A by traditional metrics. In contrast, LLM-based multidimensional evaluation aligns more with human judgement, scoring B higher than A.
  • Figure 2: An overview of the multi-phase auto-calibration framework of Auto-CSEval, including the generation and refinement of CoT instructions, evaluation criteria formulation, and the iterative calibration process aligned with expert human judgement.
  • Figure 3: Histograms of standard deviations of inter-annotator scores between first-round expert annotations and second-round expert annotations.
  • Figure 4: Prompt template: Candidate CoT drafting.
  • Figure 5: Prompt template: Scoring a counterspeech.
  • ...and 1 more figures