Table of Contents
Fetching ...

Uncovering Latent Arguments in Social Media Messaging by Employing LLMs-in-the-Loop Strategy

Tunazzina Islam, Dan Goldwasser

TL;DR

This work tackles the problem of uncovering latent arguments within social media messaging by moving beyond fixed labels and manual coding. It presents an iterative LLMs-in-the-Loop framework that clusters theme-based messages, creates sub-cluster summaries with zero-shot summarization, and prompts LLMs to generate and refine talking points, followed by redundancy filtering and mapping via cosine similarity. The approach is validated on climate campaigns (14k ads, 25 themes) and COVID-19 vaccine campaigns (9k ads, 14 themes), identifying $113$ climate and $47$ vaccine talking points after the first pass and adding $100$ and $31$ more after a second pass, with improved downstream stance prediction when talking points are included. The study also analyzes demographic targeting and event-driven shifts in talking points, demonstrating practical implications for CSS research and policy analysis by enabling scalable, dynamic analysis of public discourse.

Abstract

The widespread use of social media has led to a surge in popularity for automated methods of analyzing public opinion. Supervised methods are adept at text categorization, yet the dynamic nature of social media discussions poses a continual challenge for these techniques due to the constant shifting of the focus. On the other hand, traditional unsupervised methods for extracting themes from public discourse, such as topic modeling, often reveal overarching patterns that might not capture specific nuances. Consequently, a significant portion of research into social media discourse still depends on labor-intensive manual coding techniques and a human-in-the-loop approach, which are both time-consuming and costly. In this work, we study the problem of discovering arguments associated with a specific theme. We propose a generic LLMs-in-the-Loop strategy that leverages the advanced capabilities of Large Language Models (LLMs) to extract latent arguments from social media messaging. To demonstrate our approach, we apply our framework to contentious topics. We use two publicly available datasets: (1) the climate campaigns dataset of 14k Facebook ads with 25 themes and (2) the COVID-19 vaccine campaigns dataset of 9k Facebook ads with 14 themes. Additionally, we design a downstream task as stance prediction by leveraging talking points in climate debates. Furthermore, we analyze demographic targeting and the adaptation of messaging based on real-world events.

Uncovering Latent Arguments in Social Media Messaging by Employing LLMs-in-the-Loop Strategy

TL;DR

This work tackles the problem of uncovering latent arguments within social media messaging by moving beyond fixed labels and manual coding. It presents an iterative LLMs-in-the-Loop framework that clusters theme-based messages, creates sub-cluster summaries with zero-shot summarization, and prompts LLMs to generate and refine talking points, followed by redundancy filtering and mapping via cosine similarity. The approach is validated on climate campaigns (14k ads, 25 themes) and COVID-19 vaccine campaigns (9k ads, 14 themes), identifying climate and vaccine talking points after the first pass and adding and more after a second pass, with improved downstream stance prediction when talking points are included. The study also analyzes demographic targeting and event-driven shifts in talking points, demonstrating practical implications for CSS research and policy analysis by enabling scalable, dynamic analysis of public discourse.

Abstract

The widespread use of social media has led to a surge in popularity for automated methods of analyzing public opinion. Supervised methods are adept at text categorization, yet the dynamic nature of social media discussions poses a continual challenge for these techniques due to the constant shifting of the focus. On the other hand, traditional unsupervised methods for extracting themes from public discourse, such as topic modeling, often reveal overarching patterns that might not capture specific nuances. Consequently, a significant portion of research into social media discourse still depends on labor-intensive manual coding techniques and a human-in-the-loop approach, which are both time-consuming and costly. In this work, we study the problem of discovering arguments associated with a specific theme. We propose a generic LLMs-in-the-Loop strategy that leverages the advanced capabilities of Large Language Models (LLMs) to extract latent arguments from social media messaging. To demonstrate our approach, we apply our framework to contentious topics. We use two publicly available datasets: (1) the climate campaigns dataset of 14k Facebook ads with 25 themes and (2) the COVID-19 vaccine campaigns dataset of 9k Facebook ads with 14 themes. Additionally, we design a downstream task as stance prediction by leveraging talking points in climate debates. Furthermore, we analyze demographic targeting and the adaptation of messaging based on real-world events.
Paper Structure (32 sections, 7 figures, 9 tables)

This paper contains 32 sections, 7 figures, 9 tables.

Figures (7)

  • Figure 1: LLMs-in-the-Loop framework. TP: Talking point.
  • Figure 2: Prompt templates (shown as zero-shot).
  • Figure 3: Prompt example of summarizing top-5 instances under patriotism theme of climate campaign dataset. The black colored segment is the input prompt and the red colored segment is the generated output by the LLMs.
  • Figure 4: Prompt example of generating talking point from a summary of top-5 instances under patriotism theme of climate campaign dataset. The black colored segment is the input prompt and the red colored segment is the generated output by the LLMs.
  • Figure 5: Correlations between arguments & moral foundations for COVID-19; arguments & stances for climate.
  • ...and 2 more figures