Table of Contents
Fetching ...

HTIM: Hybrid Text-Interaction Modeling for Broadening Political Leaning Inference in Social Media

Joseba Fernandez de Landa, Arkaitz Zubiaga, Rodrigo Agerri

TL;DR

HTIM addresses the challenge of inferring political leaning beyond binary Left-Right by fusing textual content and social interactions in a multi-party, multi-region setting. It introduces a flexible hybrid framework that combines text-based representations (TF-IDF, Word2Vec, Transformers) with interaction-based embeddings (DeepWalk, Node2Vec, Relational Embeddings), and demonstrates that HTIM yields superior macro-F1 scores across three UK regions and across engagement levels, particularly boosting predictions for less-engaged users. The work provides a new dataset spanning Scotland, Wales, and Northern Ireland with Members, Supporters, and Sympathizers, and shows that while interactions are strong, the integration with text is essential for broad applicability. The results have practical implications for large-scale public-opinion analyses and motivate future work on missing-data scenarios and applying HTIM to related tasks like hate-speech and misinformation detection.

Abstract

Political leaning can be defined as the inclination of an individual towards certain political orientations that align with their personal beliefs. Political leaning inference has traditionally been framed as a binary classification problem, namely, to distinguish between left vs. right or conservative vs liberal. Furthermore, although some recent work considers political leaning inference in a multi-party multi-region framework, their study is limited to the application of social interaction data. In order to address these shortcomings, in this study we propose Hybrid Text-Interaction Modeling (HTIM), a framework that enables hybrid modeling fusioning text and interactions from Social Media to accurately identify the political leaning of users in a multi-party multi-region framework. Access to textual and interaction-based data not only allows us to compare these data sources but also avoids reliance on specific data types. We show that, while state-of-the-art text-based representations on their own are not able to improve over interaction-based representations, a combination of text-based and interaction-based modeling using HTIM considerably improves the performance across the three regions, an improvement that is more prominent when we focus on the most challenging cases involving users who are less engaged in politics.

HTIM: Hybrid Text-Interaction Modeling for Broadening Political Leaning Inference in Social Media

TL;DR

HTIM addresses the challenge of inferring political leaning beyond binary Left-Right by fusing textual content and social interactions in a multi-party, multi-region setting. It introduces a flexible hybrid framework that combines text-based representations (TF-IDF, Word2Vec, Transformers) with interaction-based embeddings (DeepWalk, Node2Vec, Relational Embeddings), and demonstrates that HTIM yields superior macro-F1 scores across three UK regions and across engagement levels, particularly boosting predictions for less-engaged users. The work provides a new dataset spanning Scotland, Wales, and Northern Ireland with Members, Supporters, and Sympathizers, and shows that while interactions are strong, the integration with text is essential for broad applicability. The results have practical implications for large-scale public-opinion analyses and motivate future work on missing-data scenarios and applying HTIM to related tasks like hate-speech and misinformation detection.

Abstract

Political leaning can be defined as the inclination of an individual towards certain political orientations that align with their personal beliefs. Political leaning inference has traditionally been framed as a binary classification problem, namely, to distinguish between left vs. right or conservative vs liberal. Furthermore, although some recent work considers political leaning inference in a multi-party multi-region framework, their study is limited to the application of social interaction data. In order to address these shortcomings, in this study we propose Hybrid Text-Interaction Modeling (HTIM), a framework that enables hybrid modeling fusioning text and interactions from Social Media to accurately identify the political leaning of users in a multi-party multi-region framework. Access to textual and interaction-based data not only allows us to compare these data sources but also avoids reliance on specific data types. We show that, while state-of-the-art text-based representations on their own are not able to improve over interaction-based representations, a combination of text-based and interaction-based modeling using HTIM considerably improves the performance across the three regions, an improvement that is more prominent when we focus on the most challenging cases involving users who are less engaged in politics.
Paper Structure (29 sections, 9 figures, 5 tables)

This paper contains 29 sections, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Pre-trained Transformer-based Language Models (LM) and Word2vec embeddings (w2v) usage to extract text-based user representations: (1) at Tweet level, (2) at user level and (3) the combination between tweet and user features.
  • Figure 2: Hybrid Text-Interaction Modeling (HTIM) for each user tweet by tweet: (1) Tweet level text representation, (2) user level text representation, (3) the combination between tweet and user text features, (4) user level interaction representation and (5) final hybrid representation concatenating all vectors.
  • Figure 3: Performance variations for interaction-based approaches (RE, N2V and DW), best text-based approaches (tfidf, dB and Rt) and corresponding HTIM approaches (RE+tfidf, RE+dB and RE+Rt) among different levels of political engagement on SCT (left), WAL (center) and NIR (right) datasets.
  • Figure 4: Performance variations for interaction based approaches (RE, N2V and DW), best text based approaches (tfidf, dB and Rt) and corresponding HTIM approaches (RE+tfidf, RE+dB and RE+Rt) among regions for members (left), supporters (center) and sympathizers (right) datasets.
  • Figure 5: Confusion matrices of SCT Sympathizer users trained with RE (left) or RE+tfidf (right) representations.
  • ...and 4 more figures