HTIM: Hybrid Text-Interaction Modeling for Broadening Political Leaning Inference in Social Media
Joseba Fernandez de Landa, Arkaitz Zubiaga, Rodrigo Agerri
TL;DR
HTIM addresses the challenge of inferring political leaning beyond binary Left-Right by fusing textual content and social interactions in a multi-party, multi-region setting. It introduces a flexible hybrid framework that combines text-based representations (TF-IDF, Word2Vec, Transformers) with interaction-based embeddings (DeepWalk, Node2Vec, Relational Embeddings), and demonstrates that HTIM yields superior macro-F1 scores across three UK regions and across engagement levels, particularly boosting predictions for less-engaged users. The work provides a new dataset spanning Scotland, Wales, and Northern Ireland with Members, Supporters, and Sympathizers, and shows that while interactions are strong, the integration with text is essential for broad applicability. The results have practical implications for large-scale public-opinion analyses and motivate future work on missing-data scenarios and applying HTIM to related tasks like hate-speech and misinformation detection.
Abstract
Political leaning can be defined as the inclination of an individual towards certain political orientations that align with their personal beliefs. Political leaning inference has traditionally been framed as a binary classification problem, namely, to distinguish between left vs. right or conservative vs liberal. Furthermore, although some recent work considers political leaning inference in a multi-party multi-region framework, their study is limited to the application of social interaction data. In order to address these shortcomings, in this study we propose Hybrid Text-Interaction Modeling (HTIM), a framework that enables hybrid modeling fusioning text and interactions from Social Media to accurately identify the political leaning of users in a multi-party multi-region framework. Access to textual and interaction-based data not only allows us to compare these data sources but also avoids reliance on specific data types. We show that, while state-of-the-art text-based representations on their own are not able to improve over interaction-based representations, a combination of text-based and interaction-based modeling using HTIM considerably improves the performance across the three regions, an improvement that is more prominent when we focus on the most challenging cases involving users who are less engaged in politics.
