Table of Contents
Fetching ...

The Impact of Persona-based Political Perspectives on Hateful Content Detection

Stefano Civelli, Pietro Bernardelle, Gianluca Demartini

TL;DR

The paper investigates whether persona-based prompting can substitute for politically diverse pretraining in hate-speech detection. By mapping 200,000 PersonaHub personas onto the Political Compass Test and evaluating a vision-language model on Hateful Memes and MMHS150K, the authors test whether political positions influence classifications and whether explicit ideological labeling alters behavior. Across two studies, they find little correlation between political position and decisions, even with stronger ideological prompts, suggesting prompt-based approaches may not replicate the effects of political pretraining. The results imply that achieving fair performance in downstream hate-speech detection may require direct political pretraining or task-specific interventions rather than relying solely on prompts.

Abstract

While pretraining language models with politically diverse content has been shown to improve downstream task fairness, such approaches require significant computational resources often inaccessible to many researchers and organizations. Recent work has established that persona-based prompting can introduce political diversity in model outputs without additional training. However, it remains unclear whether such prompting strategies can achieve results comparable to political pretraining for downstream tasks. We investigate this question using persona-based prompting strategies in multimodal hate-speech detection tasks, specifically focusing on hate speech in memes. Our analysis reveals that when mapping personas onto a political compass and measuring persona agreement, inherent political positioning has surprisingly little correlation with classification decisions. Notably, this lack of correlation persists even when personas are explicitly injected with stronger ideological descriptors. Our findings suggest that while LLMs can exhibit political biases in their responses to direct political questions, these biases may have less impact on practical classification tasks than previously assumed. This raises important questions about the necessity of computationally expensive political pretraining for achieving fair performance in downstream tasks.

The Impact of Persona-based Political Perspectives on Hateful Content Detection

TL;DR

The paper investigates whether persona-based prompting can substitute for politically diverse pretraining in hate-speech detection. By mapping 200,000 PersonaHub personas onto the Political Compass Test and evaluating a vision-language model on Hateful Memes and MMHS150K, the authors test whether political positions influence classifications and whether explicit ideological labeling alters behavior. Across two studies, they find little correlation between political position and decisions, even with stronger ideological prompts, suggesting prompt-based approaches may not replicate the effects of political pretraining. The results imply that achieving fair performance in downstream hate-speech detection may require direct political pretraining or task-specific interventions rather than relying solely on prompts.

Abstract

While pretraining language models with politically diverse content has been shown to improve downstream task fairness, such approaches require significant computational resources often inaccessible to many researchers and organizations. Recent work has established that persona-based prompting can introduce political diversity in model outputs without additional training. However, it remains unclear whether such prompting strategies can achieve results comparable to political pretraining for downstream tasks. We investigate this question using persona-based prompting strategies in multimodal hate-speech detection tasks, specifically focusing on hate speech in memes. Our analysis reveals that when mapping personas onto a political compass and measuring persona agreement, inherent political positioning has surprisingly little correlation with classification decisions. Notably, this lack of correlation persists even when personas are explicitly injected with stronger ideological descriptors. Our findings suggest that while LLMs can exhibit political biases in their responses to direct political questions, these biases may have less impact on practical classification tasks than previously assumed. This raises important questions about the necessity of computationally expensive political pretraining for achieving fair performance in downstream tasks.

Paper Structure

This paper contains 35 sections, 6 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Political compass distribution of PersonaHub personas when impersonated by IDEFICS-3. Darker regions indicate higher density of personas on a logarithmic scale. Colored dots represent selected extreme personas: (a) shows personas selected from all four corners of the political compass, (b) shows personas selected from the economic extremes.
  • Figure 2: Matrix showing Cohen's kappa scores for classification agreement between personas from different political quadrants. Diagonal shows intra-quadrant agreement, while off-diagonal elements show inter-quadrant agreement.
  • Figure 3: Agreement patterns between personas on the Hateful Memes dataset. The left two plots show highest and lowest agreements between the 60 personas across all political quadrants , while the right two plots show the same for the 40 personas from economic extremes. Yellow lines indicate top 5 strongest/weakest agreement pairs, grey lines show the next 5 pairs.