Table of Contents
Fetching ...

Topic Shifts as a Proxy for Assessing Politicization in Social Media

Marcelo Sartori Locatelli, Pedro Calais, Matheus Prado Miranda, João Pedro Junho, Tomas Lacerda Muniz, Wagner Meira, Virgilio Almeida

TL;DR

This paper tackles the problem of measuring politicization in social media by treating topic shifts—from non-political to political discussions—as a proxy for politicization. It introduces a seed-based Positive-Unlabeled (PU) learning framework, including a two-step procedure with spies to identify reliable negatives and a subsequent XGBoost classifier trained on word2vec features, achieving about $F1$ scores of $0.86$ for news posts and $0.80$ for comments. The approach is applied to multi-platform data (Twitter, YouTube, TikTok) from Brazil's 2022 elections, revealing widespread politicization across both hard and soft topics, with notable temporal spikes around political events. The work provides scalable, label-efficient insights into politicization, highlights platform and topic differences, and proposes avenues to relate politicization to polarization and user-level dynamics in future research.

Abstract

Politicization is a social phenomenon studied by political science characterized by the extent to which ideas and facts are given a political tone. A range of topics, such as climate change, religion and vaccines has been subject to increasing politicization in the media and social media platforms. In this work, we propose a computational method for assessing politicization in online conversations based on topic shifts, i.e., the degree to which people switch topics in online conversations. The intuition is that topic shifts from a non-political topic to politics are a direct measure of politicization -- making something political, and that the more people switch conversations to politics, the more they perceive politics as playing a vital role in their daily lives. A fundamental challenge that must be addressed when one studies politicization in social media is that, a priori, any topic may be politicized. Hence, any keyword-based method or even machine learning approaches that rely on topic labels to classify topics are expensive to run and potentially ineffective. Instead, we learn from a seed of political keywords and use Positive-Unlabeled (PU) Learning to detect political comments in reaction to non-political news articles posted on Twitter, YouTube, and TikTok during the 2022 Brazilian presidential elections. Our findings indicate that all platforms show evidence of politicization as discussion around topics adjacent to politics such as economy, crime and drugs tend to shift to politics. Even the least politicized topics had the rate in which their topics shift to politics increased in the lead up to the elections and after other political events in Brazil -- an evidence of politicization.

Topic Shifts as a Proxy for Assessing Politicization in Social Media

TL;DR

This paper tackles the problem of measuring politicization in social media by treating topic shifts—from non-political to political discussions—as a proxy for politicization. It introduces a seed-based Positive-Unlabeled (PU) learning framework, including a two-step procedure with spies to identify reliable negatives and a subsequent XGBoost classifier trained on word2vec features, achieving about scores of for news posts and for comments. The approach is applied to multi-platform data (Twitter, YouTube, TikTok) from Brazil's 2022 elections, revealing widespread politicization across both hard and soft topics, with notable temporal spikes around political events. The work provides scalable, label-efficient insights into politicization, highlights platform and topic differences, and proposes avenues to relate politicization to polarization and user-level dynamics in future research.

Abstract

Politicization is a social phenomenon studied by political science characterized by the extent to which ideas and facts are given a political tone. A range of topics, such as climate change, religion and vaccines has been subject to increasing politicization in the media and social media platforms. In this work, we propose a computational method for assessing politicization in online conversations based on topic shifts, i.e., the degree to which people switch topics in online conversations. The intuition is that topic shifts from a non-political topic to politics are a direct measure of politicization -- making something political, and that the more people switch conversations to politics, the more they perceive politics as playing a vital role in their daily lives. A fundamental challenge that must be addressed when one studies politicization in social media is that, a priori, any topic may be politicized. Hence, any keyword-based method or even machine learning approaches that rely on topic labels to classify topics are expensive to run and potentially ineffective. Instead, we learn from a seed of political keywords and use Positive-Unlabeled (PU) Learning to detect political comments in reaction to non-political news articles posted on Twitter, YouTube, and TikTok during the 2022 Brazilian presidential elections. Our findings indicate that all platforms show evidence of politicization as discussion around topics adjacent to politics such as economy, crime and drugs tend to shift to politics. Even the least politicized topics had the rate in which their topics shift to politics increased in the lead up to the elections and after other political events in Brazil -- an evidence of politicization.
Paper Structure (14 sections, 6 figures, 9 tables, 1 algorithm)

This paper contains 14 sections, 6 figures, 9 tables, 1 algorithm.

Figures (6)

  • Figure 1: The two-step Positive-Unlabeled (PU) learning technique. Step 1 is fed with political and unlabeled examples and divides the unlabeled set into two sets -- reliable non-political and a smaller unlabeled set. Step 2 is a traditional binary classifier fed with political and reliable non-political examples. Squares represent examples treated as unlabeled during the first step, while circles represent examples treated as labeled. Red represents examples classified as non-political, blue represents political and yellow, unlabeled.
  • Figure 2: Confusion matrix for news and comments predictions for the XGBoost PU learning model. The numbers inside the parentheses show the comment predictions, while those outside show news post predictions. Performance is superior for news posts, possibly due to comments having less context and structure than well-formed news headlines. Note how the errors are relatively well balanced.
  • Figure 3: Probability density function (PDF) of topic shifts on all platforms. While discussions about politics and soccer typically remain on-topic, political comments frequently emerge in non-political discussions, causing a second peak in the probability of topic shifting in Figure \ref{['fig:np-transitions']}.
  • Figure 4: Cumulative Distribution Functions (CDFs) of the percentage of comments exhibiting Topic Shifts for Twitter and TikTok posts. Distribution for YouTube was omitted due to its similarity to the Twitter one.
  • Figure 5: Ratio of YouTube comments which are a topic shift from the news posts, by week. During the two rounds of the Brazilian elections, we see spikes on non-political content being politicized. The gray area is the confidence interval.
  • ...and 1 more figures