Table of Contents
Fetching ...

Unveiling Behavioral Differences in Bilingual Information Operations: A Network-Based Approach

Bowen Yi

TL;DR

This study develops a language-aware, network-based framework to detect information-operation drivers on Twitter during the 2024 U.S. election by fusing Co-Domain, Co-Hashtag, and Text Similarity networks and applying unsupervised clustering. It demonstrates that English and Spanish IO drivers exhibit distinct topics, domains, and engagement patterns, and reveals that bilingual users play unique, bridging roles with language-dependent behaviors. The authors also introduce a novel, label-free evaluation method for clustering quality and show that fixed-edge-filtering or pruning parameters may hamper cross-language performance, underscoring the need for language-specific tuning. Overall, the work highlights the importance of culturally and linguistically adaptable IO detection to mitigate influence campaigns and lays groundwork for multilingual, human-centered IO detection systems, with open-source code and data forthcoming on GitHub.

Abstract

Twitter has become a pivotal platform for conducting information operations (IOs), particularly during high-stakes political events. In this study, we analyze over a million tweets about the 2024 U.S. presidential election to explore an under-studied area: the behavioral differences of IO drivers from English- and Spanish-speaking communities. Using similarity graphs constructed from behavioral patterns, we identify IO drivers in both languages and evaluate the clustering quality of these graphs in an unsupervised setting. Our analysis demonstrates how different network dismantling strategies, such as node pruning and edge filtering, can impact clustering quality and the identification of coordinated IO drivers. We also reveal significant differences in the topics and political indicators between English and Spanish IO drivers. Additionally, we investigate bilingual users who post in both languages, systematically uncovering their distinct roles and behaviors compared to monolingual users. These findings underscore the importance of robust, culturally and linguistically adaptable IO detection methods to mitigate the risks of influence campaigns on social media. Our code and data are available on GitHub: https://github.com/bowenyi-pierre/humans-lab-hackathon-24.

Unveiling Behavioral Differences in Bilingual Information Operations: A Network-Based Approach

TL;DR

This study develops a language-aware, network-based framework to detect information-operation drivers on Twitter during the 2024 U.S. election by fusing Co-Domain, Co-Hashtag, and Text Similarity networks and applying unsupervised clustering. It demonstrates that English and Spanish IO drivers exhibit distinct topics, domains, and engagement patterns, and reveals that bilingual users play unique, bridging roles with language-dependent behaviors. The authors also introduce a novel, label-free evaluation method for clustering quality and show that fixed-edge-filtering or pruning parameters may hamper cross-language performance, underscoring the need for language-specific tuning. Overall, the work highlights the importance of culturally and linguistically adaptable IO detection to mitigate influence campaigns and lays groundwork for multilingual, human-centered IO detection systems, with open-source code and data forthcoming on GitHub.

Abstract

Twitter has become a pivotal platform for conducting information operations (IOs), particularly during high-stakes political events. In this study, we analyze over a million tweets about the 2024 U.S. presidential election to explore an under-studied area: the behavioral differences of IO drivers from English- and Spanish-speaking communities. Using similarity graphs constructed from behavioral patterns, we identify IO drivers in both languages and evaluate the clustering quality of these graphs in an unsupervised setting. Our analysis demonstrates how different network dismantling strategies, such as node pruning and edge filtering, can impact clustering quality and the identification of coordinated IO drivers. We also reveal significant differences in the topics and political indicators between English and Spanish IO drivers. Additionally, we investigate bilingual users who post in both languages, systematically uncovering their distinct roles and behaviors compared to monolingual users. These findings underscore the importance of robust, culturally and linguistically adaptable IO detection methods to mitigate the risks of influence campaigns on social media. Our code and data are available on GitHub: https://github.com/bowenyi-pierre/humans-lab-hackathon-24.
Paper Structure (22 sections, 2 equations, 8 figures, 10 tables)

This paper contains 22 sections, 2 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: Comparison of the 5 most popular web domains in English and Spanish tweets. Factuality and political leaning were obtained from MBFC. Non-informative domains without MBFC records were excluded from analysis. English tweets contained more right-leaning and lower-factuality domains compared to Spanish tweets.
  • Figure 2: Co-Domain networks for English (left) and Spanish (right) tweets. Each node represents a Twitter account, with size proportional to the eigenvector centrality. Clusters are colored differently, and key clusters are annotated with their characteristics, including the most widely shared web domains.
  • Figure 3: Co-Hashtag networks for English (left) and Spanish (right) tweets. Node sizes are proportional to eigenvector centrality. Key clusters are analyzed, including their five most common tags.
  • Figure 4: Text similarity networks for English (left) and Spanish (right) tweets. Except for common cluster characteristics, we illustrate the 5 most common topics for representative clusters
  • Figure 5: Clustering quality evaluation for English (left) and Spanish (right) fused networks after five different operations. Small clusters without topics were excluded from the analysis. For weight-based edge filtering, only the top 30% of edges were retained.
  • ...and 3 more figures