The Impact of Persona-based Political Perspectives on Hateful Content Detection

Stefano Civelli; Pietro Bernardelle; Gianluca Demartini

The Impact of Persona-based Political Perspectives on Hateful Content Detection

Stefano Civelli, Pietro Bernardelle, Gianluca Demartini

TL;DR

The paper investigates whether persona-based prompting can substitute for politically diverse pretraining in hate-speech detection. By mapping 200,000 PersonaHub personas onto the Political Compass Test and evaluating a vision-language model on Hateful Memes and MMHS150K, the authors test whether political positions influence classifications and whether explicit ideological labeling alters behavior. Across two studies, they find little correlation between political position and decisions, even with stronger ideological prompts, suggesting prompt-based approaches may not replicate the effects of political pretraining. The results imply that achieving fair performance in downstream hate-speech detection may require direct political pretraining or task-specific interventions rather than relying solely on prompts.

Abstract

While pretraining language models with politically diverse content has been shown to improve downstream task fairness, such approaches require significant computational resources often inaccessible to many researchers and organizations. Recent work has established that persona-based prompting can introduce political diversity in model outputs without additional training. However, it remains unclear whether such prompting strategies can achieve results comparable to political pretraining for downstream tasks. We investigate this question using persona-based prompting strategies in multimodal hate-speech detection tasks, specifically focusing on hate speech in memes. Our analysis reveals that when mapping personas onto a political compass and measuring persona agreement, inherent political positioning has surprisingly little correlation with classification decisions. Notably, this lack of correlation persists even when personas are explicitly injected with stronger ideological descriptors. Our findings suggest that while LLMs can exhibit political biases in their responses to direct political questions, these biases may have less impact on practical classification tasks than previously assumed. This raises important questions about the necessity of computationally expensive political pretraining for achieving fair performance in downstream tasks.

The Impact of Persona-based Political Perspectives on Hateful Content Detection

TL;DR

Abstract

The Impact of Persona-based Political Perspectives on Hateful Content Detection

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)