Table of Contents
Fetching ...

Discovering Latent Themes in Social Media Messaging: A Machine-in-the-Loop Approach Integrating LLMs

Tunazzina Islam, Dan Goldwasser

TL;DR

The paper presents a machine-in-the-loop framework that integrates large language models (LLMs) with clustering and human validation to uncover latent, fine-grained themes in social media messaging. It applies the method to climate campaigns and COVID-19 vaccine campaigns, using K-means clustering, Sentence-BERT embeddings, and GPT-4-based coherency checks, summaries, and mappings to iteratively refine themes. Results show improved coverage and high mapping accuracy compared to SBERT, LDA, and BERTopic baselines, with themes exhibiting strong correlations to stances and moral foundations and revealing demographic targeting patterns. The study demonstrates the approach’s scalability, interpretability, and capacity to reveal theme shifts in response to real-world events, offering actionable insights for social science research and policy analysis while acknowledging ethical considerations and potential misuse risks.

Abstract

Grasping the themes of social media content is key to understanding the narratives that influence public opinion and behavior. The thematic analysis goes beyond traditional topic-level analysis, which often captures only the broadest patterns, providing deeper insights into specific and actionable themes such as "public sentiment towards vaccination", "political discourse surrounding climate policies," etc. In this paper, we introduce a novel approach to uncovering latent themes in social media messaging. Recognizing the limitations of the traditional topic-level analysis, which tends to capture only overarching patterns, this study emphasizes the need for a finer-grained, theme-focused exploration. Traditional theme discovery methods typically involve manual processes and a human-in-the-loop approach. While valuable, these methods face challenges in scalability, consistency, and resource intensity in terms of time and cost. To address these challenges, we propose a machine-in-the-loop approach that leverages the advanced capabilities of Large Language Models (LLMs). To demonstrate our approach, we apply our framework to contentious topics, such as climate debate and vaccine debate. We use two publicly available datasets: (1) the climate campaigns dataset of 21k Facebook ads and (2) the COVID-19 vaccine campaigns dataset of 9k Facebook ads. Our quantitative and qualitative analysis shows that our methodology yields more accurate and interpretable results compared to the baselines. Our results not only demonstrate the effectiveness of our approach in uncovering latent themes but also illuminate how these themes are tailored for demographic targeting in social media contexts. Additionally, our work sheds light on the dynamic nature of social media, revealing the shifts in the thematic focus of messaging in response to real-world events.

Discovering Latent Themes in Social Media Messaging: A Machine-in-the-Loop Approach Integrating LLMs

TL;DR

The paper presents a machine-in-the-loop framework that integrates large language models (LLMs) with clustering and human validation to uncover latent, fine-grained themes in social media messaging. It applies the method to climate campaigns and COVID-19 vaccine campaigns, using K-means clustering, Sentence-BERT embeddings, and GPT-4-based coherency checks, summaries, and mappings to iteratively refine themes. Results show improved coverage and high mapping accuracy compared to SBERT, LDA, and BERTopic baselines, with themes exhibiting strong correlations to stances and moral foundations and revealing demographic targeting patterns. The study demonstrates the approach’s scalability, interpretability, and capacity to reveal theme shifts in response to real-world events, offering actionable insights for social science research and policy analysis while acknowledging ethical considerations and potential misuse risks.

Abstract

Grasping the themes of social media content is key to understanding the narratives that influence public opinion and behavior. The thematic analysis goes beyond traditional topic-level analysis, which often captures only the broadest patterns, providing deeper insights into specific and actionable themes such as "public sentiment towards vaccination", "political discourse surrounding climate policies," etc. In this paper, we introduce a novel approach to uncovering latent themes in social media messaging. Recognizing the limitations of the traditional topic-level analysis, which tends to capture only overarching patterns, this study emphasizes the need for a finer-grained, theme-focused exploration. Traditional theme discovery methods typically involve manual processes and a human-in-the-loop approach. While valuable, these methods face challenges in scalability, consistency, and resource intensity in terms of time and cost. To address these challenges, we propose a machine-in-the-loop approach that leverages the advanced capabilities of Large Language Models (LLMs). To demonstrate our approach, we apply our framework to contentious topics, such as climate debate and vaccine debate. We use two publicly available datasets: (1) the climate campaigns dataset of 21k Facebook ads and (2) the COVID-19 vaccine campaigns dataset of 9k Facebook ads. Our quantitative and qualitative analysis shows that our methodology yields more accurate and interpretable results compared to the baselines. Our results not only demonstrate the effectiveness of our approach in uncovering latent themes but also illuminate how these themes are tailored for demographic targeting in social media contexts. Additionally, our work sheds light on the dynamic nature of social media, revealing the shifts in the thematic focus of messaging in response to real-world events.
Paper Structure (32 sections, 11 figures, 7 tables)

This paper contains 32 sections, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Framework overview.
  • Figure 2: Example of an incoherent cluster in Climate.
  • Figure 3: Example of a merged cluster in Climate.
  • Figure 4: Prompt template for coherency check (shown as zero-shot).
  • Figure 5: Prompt template for generating summary (shown as zero-shot).
  • ...and 6 more figures