Table of Contents
Fetching ...

Hostility Detection in UK Politics: A Dataset on Online Abuse Targeting MPs

Mugdha Pandya, Mali Jin, Kalina Bontcheva, Diana Maynard

TL;DR

This work addresses the problem of online hostility toward UK MPs on X by building a dedicated, expert-annotated dataset of 3,320 English tweets with labeled hostility and targeted identity characteristics. It conducts linguistic and topical analyses to reveal distinctive patterns and topics associated with hostility, and evaluates a range of models—from standard transformers to domain-adapted and large language models—across flat and hierarchical classification schemes for binary hostility and multi-class identity labeling. Key findings include the prominence of race-based hostility and the positive impact of providing identity-type definitions to LLMs, with domain-adapted RoBERTa improving PLM performance. The dataset and insights offer a valuable resource for research on UK-specific political hostility and provide a foundation for developing better detection and mitigation strategies in political discourse online.

Abstract

Numerous politicians use social media platforms, particularly X, to engage with their constituents. This interaction allows constituents to pose questions and offer feedback but also exposes politicians to a barrage of hostile responses, especially given the anonymity afforded by social media. They are typically targeted in relation to their governmental role, but the comments also tend to attack their personal identity. This can discredit politicians and reduce public trust in the government. It can also incite anger and disrespect, leading to offline harm and violence. While numerous models exist for detecting hostility in general, they lack the specificity required for political contexts. Furthermore, addressing hostility towards politicians demands tailored approaches due to the distinct language and issues inherent to each country (e.g., Brexit for the UK). To bridge this gap, we construct a dataset of 3,320 English tweets spanning a two-year period manually annotated for hostility towards UK MPs. Our dataset also captures the targeted identity characteristics (race, gender, religion, none) in hostile tweets. We perform linguistic and topical analyses to delve into the unique content of the UK political data. Finally, we evaluate the performance of pre-trained language models and large language models on binary hostility detection and multi-class targeted identity type classification tasks. Our study offers valuable data and insights for future research on the prevalence and nature of politics-related hostility specific to the UK.

Hostility Detection in UK Politics: A Dataset on Online Abuse Targeting MPs

TL;DR

This work addresses the problem of online hostility toward UK MPs on X by building a dedicated, expert-annotated dataset of 3,320 English tweets with labeled hostility and targeted identity characteristics. It conducts linguistic and topical analyses to reveal distinctive patterns and topics associated with hostility, and evaluates a range of models—from standard transformers to domain-adapted and large language models—across flat and hierarchical classification schemes for binary hostility and multi-class identity labeling. Key findings include the prominence of race-based hostility and the positive impact of providing identity-type definitions to LLMs, with domain-adapted RoBERTa improving PLM performance. The dataset and insights offer a valuable resource for research on UK-specific political hostility and provide a foundation for developing better detection and mitigation strategies in political discourse online.

Abstract

Numerous politicians use social media platforms, particularly X, to engage with their constituents. This interaction allows constituents to pose questions and offer feedback but also exposes politicians to a barrage of hostile responses, especially given the anonymity afforded by social media. They are typically targeted in relation to their governmental role, but the comments also tend to attack their personal identity. This can discredit politicians and reduce public trust in the government. It can also incite anger and disrespect, leading to offline harm and violence. While numerous models exist for detecting hostility in general, they lack the specificity required for political contexts. Furthermore, addressing hostility towards politicians demands tailored approaches due to the distinct language and issues inherent to each country (e.g., Brexit for the UK). To bridge this gap, we construct a dataset of 3,320 English tweets spanning a two-year period manually annotated for hostility towards UK MPs. Our dataset also captures the targeted identity characteristics (race, gender, religion, none) in hostile tweets. We perform linguistic and topical analyses to delve into the unique content of the UK political data. Finally, we evaluate the performance of pre-trained language models and large language models on binary hostility detection and multi-class targeted identity type classification tasks. Our study offers valuable data and insights for future research on the prevalence and nature of politics-related hostility specific to the UK.

Paper Structure

This paper contains 37 sections, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Annotation platform user interface.
  • Figure 2: Comparing political party-based differences in the amount and type of hostility received
  • Figure 3: Comparing identity-based differences in the amount and type of hostility received
  • Figure 4: Top 100 BOW unigrams associated with hostile and non-hostile tweets. The larger the text size, the higher the Pearson correlation coefficient $r$, and vice versa.
  • Figure 5: Top 100 BOW bigrams associated with hostile and non-hostile tweets. The larger the text size, the higher the Pearson correlation coefficient $r$, and vice versa.
  • ...and 1 more figures