Domain-based user embedding for competing events on social media
Wentao Xu, Kazutoshi Sasahara
TL;DR
This work introduces a domain-based user embedding method that leverages URL domain co-occurrence in retweet behavior to characterize polarized user clusters involved in competing events on social media. By constructing a domain co-occurrence network and deriving per-user embeddings via summing domain vectors learned with Node2vec, the approach achieves higher classification accuracy and macro-F1 across topics than network- or content-based baselines, while also reducing computational cost. The method enables intuitive visualizations of user similarity and boundary delineation between opposing groups, providing a practical tool for studying echo chambers and polarization dynamics. Its robustness to data sparsity and potential for integration with language models suggests broad applicability to computational social science analyses of social divide and information diffusion.
Abstract
Social divide and polarization have become significant societal issues. To understand the mechanisms behind these phenomena, social media analysis offers research opportunities in computational social science, where developing effective user embedding methods is essential for subsequent analysis. Traditionally, researchers have used predefined network-based user features (e.g., network size, degree, and centrality measures). However, because such measures may not capture the complex characteristics of social media users, in our study we developed a method for embedding users based on a URL domain co-occurrence network. This approach effectively represents social media users involved in competing events such as political campaigns and public health crises. We assessed the method's performance using binary classification tasks and datasets that covered topics associated with the COVID-19 infodemic, such as QAnon, Biden, and Ivermectin, among Twitter users. Our results revealed that user embeddings generated directly from the retweet network and/or based on language performed below expectations, whereas our domain-based embeddings outperformed those methods while reducing computation time. Therefore, domain-based embedding offers an accessible and effective method for characterizing social media users in competing events.
