Table of Contents
Fetching ...

Language, Culture, and Ideology: Personalizing Offensiveness Detection in Political Tweets with Reasoning LLMs

Dzmitry Pihulski, Jan Kocoń

TL;DR

This study examines personalized offensiveness detection in political tweets by prompting LLMs with ideological and cultural personas across English, Polish, and Russian. It compares large reasoning-enabled models, large non-reasoning models, and smaller models with/without reasoning to assess how reasoning affects personalization and interpretability. Findings show that reasoning-enabled models achieve stronger ideological discrimination and cross-language consistency, while non-reasoning or small models struggle to capture nuanced perspectives; English-dominant reasoning is a notable artifact. The work advances a multilingual, persona-driven framework and releases data and methods to evaluate reasoning-driven personalization in sociopolitical text classification, highlighting both potential benefits and ethical considerations.

Abstract

We explore how large language models (LLMs) assess offensiveness in political discourse when prompted to adopt specific political and cultural perspectives. Using a multilingual subset of the MD-Agreement dataset centered on tweets from the 2020 US elections, we evaluate several recent LLMs - including DeepSeek-R1, o4-mini, GPT-4.1-mini, Qwen3, Gemma, and Mistral - tasked with judging tweets as offensive or non-offensive from the viewpoints of varied political personas (far-right, conservative, centrist, progressive) across English, Polish, and Russian contexts. Our results show that larger models with explicit reasoning abilities (e.g., DeepSeek-R1, o4-mini) are more consistent and sensitive to ideological and cultural variation, while smaller models often fail to capture subtle distinctions. We find that reasoning capabilities significantly improve both the personalization and interpretability of offensiveness judgments, suggesting that such mechanisms are key to adapting LLMs for nuanced sociopolitical text classification across languages and ideologies.

Language, Culture, and Ideology: Personalizing Offensiveness Detection in Political Tweets with Reasoning LLMs

TL;DR

This study examines personalized offensiveness detection in political tweets by prompting LLMs with ideological and cultural personas across English, Polish, and Russian. It compares large reasoning-enabled models, large non-reasoning models, and smaller models with/without reasoning to assess how reasoning affects personalization and interpretability. Findings show that reasoning-enabled models achieve stronger ideological discrimination and cross-language consistency, while non-reasoning or small models struggle to capture nuanced perspectives; English-dominant reasoning is a notable artifact. The work advances a multilingual, persona-driven framework and releases data and methods to evaluate reasoning-driven personalization in sociopolitical text classification, highlighting both potential benefits and ethical considerations.

Abstract

We explore how large language models (LLMs) assess offensiveness in political discourse when prompted to adopt specific political and cultural perspectives. Using a multilingual subset of the MD-Agreement dataset centered on tweets from the 2020 US elections, we evaluate several recent LLMs - including DeepSeek-R1, o4-mini, GPT-4.1-mini, Qwen3, Gemma, and Mistral - tasked with judging tweets as offensive or non-offensive from the viewpoints of varied political personas (far-right, conservative, centrist, progressive) across English, Polish, and Russian contexts. Our results show that larger models with explicit reasoning abilities (e.g., DeepSeek-R1, o4-mini) are more consistent and sensitive to ideological and cultural variation, while smaller models often fail to capture subtle distinctions. We find that reasoning capabilities significantly improve both the personalization and interpretability of offensiveness judgments, suggesting that such mechanisms are key to adapting LLMs for nuanced sociopolitical text classification across languages and ideologies.

Paper Structure

This paper contains 16 sections, 6 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Example personality representation used in the prompt. The model receives not only the name, age, sex, and political affiliation, but also the nationality, allowing us to explore how individuals from different nationalities might respond to the same text.
  • Figure 2: Correlation plots showing the agreement in tweet classifications across different political perspectives for two reasoning-enabled language models: DeepSeek-R1 and OpenAI's o4-mini. Each plot visualizes how consistently each model assigns offensive/non-offensive labels based on the provided personality profiles, highlighting alignment or divergence across political viewpoints.
  • Figure 3: Upset plot showing the overlap in classification decisions between the 'Moderate Conservative Polish' and 'Progressive Left English' personalities using the DeepSeek-R1 model. Despite a relatively low Pearson correlation ($r = 0.27$), the two groups agree on approximately 54% of the samples, highlighting the limitations of using correlation alone to assess agreement in binary classification tasks.
  • Figure 4: Upset plots showing the intersection of classification decisions across English, Polish, and Russian for each political group for the DeepSeek-R1 model's responses. These plots visualize how consistently the model classified tweets as offensive or not across different nationalities and languages within each personality profile.
  • Figure 5: Correlation plots showing the agreement in tweet classifications across different political perspectives for two reasoning-disabled language models: DeepSeek-V3 and OpenAI's GPT-4.1-mini. Each plot visualizes how consistently each model assigns offensive/non-offensive labels based on the provided personality profiles, highlighting alignment or divergence across political viewpoints.
  • ...and 2 more figures