Uncovering Latent Arguments in Social Media Messaging by Employing LLMs-in-the-Loop Strategy
Tunazzina Islam, Dan Goldwasser
TL;DR
This work tackles the problem of uncovering latent arguments within social media messaging by moving beyond fixed labels and manual coding. It presents an iterative LLMs-in-the-Loop framework that clusters theme-based messages, creates sub-cluster summaries with zero-shot summarization, and prompts LLMs to generate and refine talking points, followed by redundancy filtering and mapping via cosine similarity. The approach is validated on climate campaigns (14k ads, 25 themes) and COVID-19 vaccine campaigns (9k ads, 14 themes), identifying $113$ climate and $47$ vaccine talking points after the first pass and adding $100$ and $31$ more after a second pass, with improved downstream stance prediction when talking points are included. The study also analyzes demographic targeting and event-driven shifts in talking points, demonstrating practical implications for CSS research and policy analysis by enabling scalable, dynamic analysis of public discourse.
Abstract
The widespread use of social media has led to a surge in popularity for automated methods of analyzing public opinion. Supervised methods are adept at text categorization, yet the dynamic nature of social media discussions poses a continual challenge for these techniques due to the constant shifting of the focus. On the other hand, traditional unsupervised methods for extracting themes from public discourse, such as topic modeling, often reveal overarching patterns that might not capture specific nuances. Consequently, a significant portion of research into social media discourse still depends on labor-intensive manual coding techniques and a human-in-the-loop approach, which are both time-consuming and costly. In this work, we study the problem of discovering arguments associated with a specific theme. We propose a generic LLMs-in-the-Loop strategy that leverages the advanced capabilities of Large Language Models (LLMs) to extract latent arguments from social media messaging. To demonstrate our approach, we apply our framework to contentious topics. We use two publicly available datasets: (1) the climate campaigns dataset of 14k Facebook ads with 25 themes and (2) the COVID-19 vaccine campaigns dataset of 9k Facebook ads with 14 themes. Additionally, we design a downstream task as stance prediction by leveraging talking points in climate debates. Furthermore, we analyze demographic targeting and the adaptation of messaging based on real-world events.
