A Multi-Aspect Framework for Counter Narrative Evaluation using Large Language Models
Jaylen Jones, Lingbo Mo, Eric Fosler-Lussier, Huan Sun
TL;DR
Counter narratives are a crucial tool for de-escalating hate speech, but evaluating generated responses has relied on reference-based metrics that misalign with human judgments. The authors propose a multi-aspect evaluation framework that prompts LLMs to score counter narratives along five NGO-derived aspects, creating a reference-free, interpretable evaluation. Their validation on 180 hate-speech/counter narrative pairs shows that LLM evaluators align more closely with AMT-human judgments than traditional metrics like BLEU or ROUGE, and that multi-aspect scoring improves performance for open-source models. This approach offers scalable, socially informed evaluation for counter narrative generation and other hate-speech interventions, with practical implications for deploying automated evaluation in real-world settings.
Abstract
Counter narratives - informed responses to hate speech contexts designed to refute hateful claims and de-escalate encounters - have emerged as an effective hate speech intervention strategy. While previous work has proposed automatic counter narrative generation methods to aid manual interventions, the evaluation of these approaches remains underdeveloped. Previous automatic metrics for counter narrative evaluation lack alignment with human judgment as they rely on superficial reference comparisons instead of incorporating key aspects of counter narrative quality as evaluation criteria. To address prior evaluation limitations, we propose a novel evaluation framework prompting LLMs to provide scores and feedback for generated counter narrative candidates using 5 defined aspects derived from guidelines from counter narrative specialized NGOs. We found that LLM evaluators achieve strong alignment to human-annotated scores and feedback and outperform alternative metrics, indicating their potential as multi-aspect, reference-free and interpretable evaluators for counter narrative evaluation.
