Extracting Affect Aggregates from Longitudinal Social Media Data with Temporal Adapters for Large Language Models

Georg Ahnert; Max Pellert; David Garcia; Markus Strohmaier

Extracting Affect Aggregates from Longitudinal Social Media Data with Temporal Adapters for Large Language Models

Georg Ahnert, Max Pellert, David Garcia, Markus Strohmaier

TL;DR

The paper introduces Temporal Adapters to create temporally aligned LLMs for longitudinal analysis of social media, enabling extraction of affect aggregates and public attitudes from weekly data. By fine-tuning on weekly timelines from a British Twitter panel and prompting with established survey instruments, the method yields longitudinal macroscopes of emotions and attitudes that correlate with YouGov survey data (Britain's Mood, PANAS-X, NHS, Boris Johnson, government attitudes). The approach is flexible, data-efficient, and robust across seeds and prompts, contrasting with traditional classifiers that require labeled data or dictionaries. It demonstrates internal validity via synthetically mixed data and extends to attitudinal aggregates, providing a scalable tool for timely, population-representative insights in crises and beyond.

Abstract

This paper proposes temporally aligned Large Language Models (LLMs) as a tool for longitudinal analysis of social media data. We fine-tune Temporal Adapters for Llama 3 8B on full timelines from a panel of British Twitter users, and extract longitudinal aggregates of emotions and attitudes with established questionnaires. We focus our analysis on the beginning of the COVID-19 pandemic that had a strong impact on public opinion and collective emotions. We validate our estimates against representative British survey data and find strong positive, significant correlations for several collective emotions. The obtained estimates are robust across multiple training seeds and prompt formulations, and in line with collective emotions extracted using a traditional classification model trained on labeled data. We demonstrate the flexibility of our method on questions of public opinion for which no pre-trained classifier is available. Our work extends the analysis of affect in LLMs to a longitudinal setting through Temporal Adapters. It enables flexible, new approaches towards the longitudinal analysis of social media data.

Extracting Affect Aggregates from Longitudinal Social Media Data with Temporal Adapters for Large Language Models

TL;DR

Abstract

Paper Structure (34 sections, 16 figures, 1 table)

This paper contains 34 sections, 16 figures, 1 table.

Introduction
Problem
Approach
Contribution
Related Work
Attitudes, Opinions, and Values in LLMs
Temporal Adaptation of Language Models
Simulating Survey Participants with LLMs
Measuring Affect in Surveys
Felt Emotions and Reactivity
Social Media Affect Macroscopes
Sampling Biases and Performative Behavior
Longitudinal Datasets
Twitter Panel Data
Questionnaires and Survey Data
...and 19 more sections

Figures (16)

Figure 1: Illustration of Temporal Adapters. First, we gather weekly text data from a panel of Twitter users and fine-tune Temporal Adapters for Llama 3 8B with it. Then, we prompt the fine-tuned model with established survey questions, one week at a time, and extract affect aggregates from the answer options' token probabilities. Temporal Adapters enable longitudinal analyses of affect aggregates from social media data by temporally aligning LLMs.
Figure 2: Temporal Adapter Fine-Tuning. We concatenate each week's tweets into chunks for batch-based fine-tuning. While fine-tuning each week's parameter-efficient LoRA hu_lora_2021 adapter, the original model weights are kept frozen. Fine-Tuning is performed with the causal language modeling task, i.e., next-token prediction on the original tweets, independent of any survey data.
Figure 3: Affect Aggregates Extracted from Temporal Adapters. We extract answer probabilities by prompting a weekly fine-tuned Llama 3 8B with the same question wording as in the survey yougov_britains_2024, and compare them to the respective weekly survey data, as well as to a model that classifies selected emotions in Tweets directly (TweetNLP). The time series are min-max normalized and a $3$ week rolling average is applied. The shaded orange area indicates minimum and maximum LLM answer probabilities across $3$ training seeds. Our results descriptively show in the plot a similar trend of our estimate and the survey data and we find strong positive and significant ($p<0.01$) cross-correlation between LLM probabilites and the survey data. Additional time series are provided in Figures \ref{['fig:fullTimeSeries_part1']}, \ref{['fig:fullTimeSeries_part2']}, and\ref{['fig:fullTimeSeries_part3']} in the Appendix.
Figure 4: Several Extracted Emotions Highly Correlate with Survey Data. We cross-correlate the answers we extract from Llama 3 with the respective British survey data yougov_britains_2024. Across $3$ training seeds, we show minimum and maximum correlation with error bars and indicate the worst $p$ value (*$p<0.05$,**$p<0.01$,***$p<0.001$). Our results vary strongly between emotions, but they are in line with TweetNLP's estimates. As opposed to TweetNLP, our method can be flexibly applied to extract any emotion as it does not require labeled training data.
Figure 5: Internal Validity Demonstrated in Experiments with Synthetically Mixed Data. We synthetically mix LLM training data with splits ranging from data that is labeled $100\%$ sad to $100\%$ happy. We then extract answers to yougov_britains_2024's survey question at each split, and show mean and standard deviation over 10 training seeds. The results support the internal validity of our extraction method, by showing a surprisingly linear relationship between training data ratio and extracted estimate, but also random training error.
...and 11 more figures

Extracting Affect Aggregates from Longitudinal Social Media Data with Temporal Adapters for Large Language Models

TL;DR

Abstract

Extracting Affect Aggregates from Longitudinal Social Media Data with Temporal Adapters for Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (16)