Table of Contents
Fetching ...

Two-Stage Stance Labeling: User-Hashtag Heuristics with Graph Neural Networks

Joshua Melton, Shannon Reid, Gabriel Terejanu, Siddharth Krishnan

TL;DR

This work tackles the challenge of large-scale stance labeling on social media by proposing a two-stage framework that combines textual signals with network structure: a reciprocal label propagation on a user-hashtag bipartite graph to seed stance labels, followed by semi-supervised training of GNNs on a signed, weighted user-user interaction graph using transformer-based user embeddings. Evaluated on climate change and gun control datasets, the approach consistently improves over text-only baselines, with GraphSAGE and especially GAT leveraging social connections to boost macro F1 scores (best around $F1 \approx 91.4\%$ on gun control), while zero-shot GPT-4 sometimes rivals or surpasses transformer baselines on climate change. The results demonstrate the value of integrating social science insights with scalable graph-based learning to study online polarization, particularly under varying discourse dynamics across topics. The work highlights the potential for scalable, minimally supervised stance labeling to support large-scale analysis and policy-relevant understanding of polarization dynamics.

Abstract

The high volume and rapid evolution of content on social media present major challenges for studying the stance of social media users. In this work, we develop a two stage stance labeling method that utilizes the user-hashtag bipartite graph and the user-user interaction graph. In the first stage, a simple and efficient heuristic for stance labeling uses the user-hashtag bipartite graph to iteratively update the stance association of user and hashtag nodes via a label propagation mechanism. This set of soft labels is then integrated with the user-user interaction graph to train a graph neural network (GNN) model using semi-supervised learning. We evaluate this method on two large-scale datasets containing tweets related to climate change from June 2021 to June 2022 and gun control from January 2022 to January 2023. Our experiments demonstrate that enriching text-based embeddings of users with network information from the user interaction graph using our semi-supervised GNN method outperforms both classifiers trained on user textual embeddings and zero-shot classification using LLMs such as GPT4. We discuss the need for integrating nuanced understanding from social science with the scalability of computational methods to better understand how polarization on social media occurs for divisive issues such as climate change and gun control.

Two-Stage Stance Labeling: User-Hashtag Heuristics with Graph Neural Networks

TL;DR

This work tackles the challenge of large-scale stance labeling on social media by proposing a two-stage framework that combines textual signals with network structure: a reciprocal label propagation on a user-hashtag bipartite graph to seed stance labels, followed by semi-supervised training of GNNs on a signed, weighted user-user interaction graph using transformer-based user embeddings. Evaluated on climate change and gun control datasets, the approach consistently improves over text-only baselines, with GraphSAGE and especially GAT leveraging social connections to boost macro F1 scores (best around on gun control), while zero-shot GPT-4 sometimes rivals or surpasses transformer baselines on climate change. The results demonstrate the value of integrating social science insights with scalable graph-based learning to study online polarization, particularly under varying discourse dynamics across topics. The work highlights the potential for scalable, minimally supervised stance labeling to support large-scale analysis and policy-relevant understanding of polarization dynamics.

Abstract

The high volume and rapid evolution of content on social media present major challenges for studying the stance of social media users. In this work, we develop a two stage stance labeling method that utilizes the user-hashtag bipartite graph and the user-user interaction graph. In the first stage, a simple and efficient heuristic for stance labeling uses the user-hashtag bipartite graph to iteratively update the stance association of user and hashtag nodes via a label propagation mechanism. This set of soft labels is then integrated with the user-user interaction graph to train a graph neural network (GNN) model using semi-supervised learning. We evaluate this method on two large-scale datasets containing tweets related to climate change from June 2021 to June 2022 and gun control from January 2022 to January 2023. Our experiments demonstrate that enriching text-based embeddings of users with network information from the user interaction graph using our semi-supervised GNN method outperforms both classifiers trained on user textual embeddings and zero-shot classification using LLMs such as GPT4. We discuss the need for integrating nuanced understanding from social science with the scalability of computational methods to better understand how polarization on social media occurs for divisive issues such as climate change and gun control.
Paper Structure (12 sections, 2 equations, 2 figures, 5 tables, 3 algorithms)

This paper contains 12 sections, 2 equations, 2 figures, 5 tables, 3 algorithms.

Figures (2)

  • Figure 1: Timeline of the number of Twitter conversations about climate change started each day.
  • Figure 2: Timeline of the number of Twitter conversations about gun control started each day.