A Multi-Aspect Framework for Counter Narrative Evaluation using Large Language Models

Jaylen Jones; Lingbo Mo; Eric Fosler-Lussier; Huan Sun

A Multi-Aspect Framework for Counter Narrative Evaluation using Large Language Models

Jaylen Jones, Lingbo Mo, Eric Fosler-Lussier, Huan Sun

TL;DR

Counter narratives are a crucial tool for de-escalating hate speech, but evaluating generated responses has relied on reference-based metrics that misalign with human judgments. The authors propose a multi-aspect evaluation framework that prompts LLMs to score counter narratives along five NGO-derived aspects, creating a reference-free, interpretable evaluation. Their validation on 180 hate-speech/counter narrative pairs shows that LLM evaluators align more closely with AMT-human judgments than traditional metrics like BLEU or ROUGE, and that multi-aspect scoring improves performance for open-source models. This approach offers scalable, socially informed evaluation for counter narrative generation and other hate-speech interventions, with practical implications for deploying automated evaluation in real-world settings.

Abstract

Counter narratives - informed responses to hate speech contexts designed to refute hateful claims and de-escalate encounters - have emerged as an effective hate speech intervention strategy. While previous work has proposed automatic counter narrative generation methods to aid manual interventions, the evaluation of these approaches remains underdeveloped. Previous automatic metrics for counter narrative evaluation lack alignment with human judgment as they rely on superficial reference comparisons instead of incorporating key aspects of counter narrative quality as evaluation criteria. To address prior evaluation limitations, we propose a novel evaluation framework prompting LLMs to provide scores and feedback for generated counter narrative candidates using 5 defined aspects derived from guidelines from counter narrative specialized NGOs. We found that LLM evaluators achieve strong alignment to human-annotated scores and feedback and outperform alternative metrics, indicating their potential as multi-aspect, reference-free and interpretable evaluators for counter narrative evaluation.

A Multi-Aspect Framework for Counter Narrative Evaluation using Large Language Models

TL;DR

Abstract

Paper Structure (20 sections, 11 figures, 18 tables)

This paper contains 20 sections, 11 figures, 18 tables.

Introduction
Related Work
Data and Methodology
Results
Evaluation Metric Correlation
Fine-grained Analysis
Qualitative Evaluation
Conclusion
Ethical Considerations
Limitations
Acknowledgements
Counter Narrative Generation
DialoGPT Implementation
Prompting/API details
BARTScore details
...and 5 more sections

Figures (11)

Figure 1: Example of our multi-aspect counter narrative evaluation framework.
Figure 2: Validation pipeline for our counter narrative evaluation framework. (Left) Evaluation prompt template including task description, a ChatGPT-generated aspect score rubric, and hate speech/counter narrative pair. (Right) LLM evaluation scores are generated for counter narratives and are compared to AMT-annotated evaluation.
Figure C.1: Counter narrative generation prompt.
Figure C.2: Score rubric prompt.
Figure C.3: Counter narrative evaluation prompt.
...and 6 more figures

A Multi-Aspect Framework for Counter Narrative Evaluation using Large Language Models

TL;DR

Abstract

A Multi-Aspect Framework for Counter Narrative Evaluation using Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (11)