On Zero-Shot Counterspeech Generation by LLMs
Punyajoy Saha, Aalok Agrawal, Abhik Jana, Chris Biemann, Animesh Mukherjee
TL;DR
This work investigates the intrinsic zero-shot capabilities of four large language models (GPT-2, DialoGPT, FlanT5, and ChatGPT) for counterspeech generation, comparing model sizes and introducing three prompting strategies to induce type-specific outputs. Across four hate-speech datasets (CONAN, CONAN-MT, Reddit, Gab), the authors deploy a broad suite of automatic metrics and classifiers to evaluate generation quality, toxicity, and argumentative characteristics, highlighting ChatGPT as the strongest performer overall while larger models can amplify toxicity. The study finds that manual prompts often outperform automatic strategies for controlling counterspeech types, yet certain types remain difficult to reliably generate, especially for non-chat-oriented models. These findings illuminate both the potential and the limitations of zero-shot counterspeech generation, underscoring the need for human-in-the-loop systems and robust prompting when deploying such models in practice.
Abstract
With the emergence of numerous Large Language Models (LLM), the usage of such models in various Natural Language Processing (NLP) applications is increasing extensively. Counterspeech generation is one such key task where efforts are made to develop generative models by fine-tuning LLMs with hatespeech - counterspeech pairs, but none of these attempts explores the intrinsic properties of large language models in zero-shot settings. In this work, we present a comprehensive analysis of the performances of four LLMs namely GPT-2, DialoGPT, ChatGPT and FlanT5 in zero-shot settings for counterspeech generation, which is the first of its kind. For GPT-2 and DialoGPT, we further investigate the deviation in performance with respect to the sizes (small, medium, large) of the models. On the other hand, we propose three different prompting strategies for generating different types of counterspeech and analyse the impact of such strategies on the performance of the models. Our analysis shows that there is an improvement in generation quality for two datasets (17%), however the toxicity increase (25%) with increase in model size. Considering type of model, GPT-2 and FlanT5 models are significantly better in terms of counterspeech quality but also have high toxicity as compared to DialoGPT. ChatGPT are much better at generating counter speech than other models across all metrics. In terms of prompting, we find that our proposed strategies help in improving counter speech generation across all the models.
