Table of Contents
Fetching ...

LLMs Can Infer Political Alignment from Online Conversations

Byunghwee Lee, Sangyeon Kim, Filippo Menczer, Yong-Yeol Ahn, Haewoon Kwak, Jisun An

Abstract

Due to the correlational structure in our traits such as identities, cultures, and political attitudes, seemingly innocuous preferences such as following a band or using a specific slang, can reveal private traits. This possibility, especially when combined with massive, public social data and advanced computational methods, poses a fundamental privacy risk. Given our increasing data exposure online and the rapid advancement of AI are increasing the misuse potential of such risk, it is therefore critical to understand capacity of large language models (LLMs) to exploit it. Here, using online discussions on Debate.org and Reddit, we show that LLMs can reliably infer hidden political alignment, significantly outperforming traditional machine learning models. Prediction accuracy further improves as we aggregate multiple text-level inferences into a user-level prediction, and as we use more politics-adjacent domains. We demonstrate that LLMs leverage the words that can be highly predictive of political alignment while not being explicitly political. Our findings underscore the capacity and risks of LLMs for exploiting socio-cultural correlates.

LLMs Can Infer Political Alignment from Online Conversations

Abstract

Due to the correlational structure in our traits such as identities, cultures, and political attitudes, seemingly innocuous preferences such as following a band or using a specific slang, can reveal private traits. This possibility, especially when combined with massive, public social data and advanced computational methods, poses a fundamental privacy risk. Given our increasing data exposure online and the rapid advancement of AI are increasing the misuse potential of such risk, it is therefore critical to understand capacity of large language models (LLMs) to exploit it. Here, using online discussions on Debate.org and Reddit, we show that LLMs can reliably infer hidden political alignment, significantly outperforming traditional machine learning models. Prediction accuracy further improves as we aggregate multiple text-level inferences into a user-level prediction, and as we use more politics-adjacent domains. We demonstrate that LLMs leverage the words that can be highly predictive of political alignment while not being explicitly political. Our findings underscore the capacity and risks of LLMs for exploiting socio-cultural correlates.
Paper Structure (19 sections, 10 equations, 22 figures, 11 tables)

This paper contains 19 sections, 10 equations, 22 figures, 11 tables.

Figures (22)

  • Figure 1: LLMs can reliably infer political alignment from general conversations. (a) Illustration of the political alignment inference process using LLMs applied to user-generated texts. Text categories can be political (in blue) or general (orange). (b-c) Relationship between average text-level F1 score for the political alignment inference task and LLM confidence scores for (b) DDO and (c) Reddit. Error bars represent standard error estimated via bootstrapping (n = 1,000). The average F1 score increases with the LLM's confidence in the text-level inference. (d-e) Distribution of confidence for two datasets, (d) DDO and (e) Reddit, as reported by the two LLMs. (f-i) Accuracy of LLMs on the political alignment inference task for DDO and Reddit datasets. User-level F1 scores are calculated for three text contexts: general, political, and combined. Results are presented for two LLMs (GPT-4o and Llama-3.1-8B) across three aggregation methods (see main text): (1) majority vote, (2) confidence-weighted average, and (3) maximum-confidence average. Text-level F1 scores are also shown for comparison. Panels are divided by dataset (DDO and Reddit) and model. Across all data and LLMs, user-level aggregation methods that incorporate LLM confidence scores substantially improve inference accuracy. Among the three methods, the maximum-confidence average tends to yield the highest performance, highlighting the effectiveness of leveraging highly confident predictions during inference. All the differences among aggregation methods are statistically significant based on sample paired t-test obtained by bootstrapping ($p<0.01$).
  • Figure 2: Inference accuracy varies systematically across topical categories. (a, b) User-level F1 scores (using the maximum-confidence method) for political alignment inference across categories in (a) DDO and (b) Reddit using GPT-4o and Llama-3.1-8B. As a reference, the gray dotted lines in this and following figures indicate F1=0.5, the expected macro F1 score from a random classifier with balanced classes. (c, d) Correlations of user-level F1 scores by category between GPT-4o and Llama-3.1-8B for (c) DDO and (d) Reddit. Circle size is proportional to the number of observations in each category. (e, f) Correlations of user-level F1 scores by category between DDO and Reddit for (e) GPT-4o and (f) Llama-3.1-8B. Prediction performance varies systematically across categories, independent of dataset or model.
  • Figure 3: Semantic similarity and user overlap with "Politics" predict higher inference performance across categories. Panels (a--d) show similarity measurements between general categories and the "Politics" category in the DDO dataset, based on (a) content embedding similarity, (b) debate title embedding similarity, (c) Jaccard similarity of user participation between categories, and (d) normalized pointwise mutual information (NPMI) of user participation. Content and title embedding similarities capture semantic similarity between categories, computed as the cosine similarity between category-level average embedding vectors obtained from a pre-trained Sentence-BERT model (sentence-transformers/all-mpnet-base-v2song2020mpnet). In contrast, Jaccard similarity and NPMI reflect user engagement overlap, quantifying the extent to which users participate in both categories. Jaccard similarity ranges from 0 (no shared users) to 1 (identical user sets), and NPMI ranges from $-1$ (no shared users) to $+1$ (identical user sets), with zero indicating independence. Panels (e--h) show the same comparisons for the Reddit dataset. We use similarity in average subreddit description embeddings between categories (f) instead of title embeddings. GPT-4o is used for inference in all cases. Overall, categories with greater semantic similarity and user overlap with the "Politics" category tend to yield better inference in predicting political alignment.
  • Figure 4: Word-level confidence reveals politicization patterns in general-topic categories. (a, b) Six illustrative categories are shown: (a) Economics, Health, and Science from the DDO dataset, and (b) Cars, Entertainment, and Music from the Reddit dataset. Within each category, word clouds are arranged left to right by quintiles of word-level confidence, from lowest (Q1) to highest (Q5). Word-level confidence is defined as the mean confidence, averaged over all texts containing a given word, in inferring political alignment using GPT-4o. Higher confidence indicates stronger politicization signals, contributing more to LLM inference accuracy; even general-topic words in the Q5 clouds carry partisan signals. To visualize lexical distinctiveness, word size reflects the standardized log-odds $z$-score estimated using the Dirichlet prior method monroe2008fightin. Within each confidence quintile, we display up to the 100 words with the highest $z$-score values. Word color indicates relative partisan ratio $f \in [0,1]$, where red ($f=1$) indicates exclusive usage by Republicans and blue ($f=0$) indicates exclusive usage by Democrats (see Methods). (c, d) Relationship between word-level confidence and mean F1 score for political alignment inference within each category. Each point represents one of 15 confidence quantiles, with shaded areas indicating 95% confidence intervals. Across categories, higher-confidence words consistently yield more accurate inference. These results demonstrate that LLMs effectively capture both implicit and explicit lexical cues relevant to political alignment.
  • Figure S1: (a) Performance of five annotators who independently inferred each user’s political alignment (Republican / Democrat) from 50 randomly sampled comments. Bars show individual accuracy and macro F1 scores, demonstrating consistently high classification performance across annotators. (b) Comparison between the mean individual accuracy and the majority-vote accuracy against the true labels. The high majority-vote accuracy (0.92) and mean annotator accuracy (0.85) indicate that the heuristic labeling of a user's political alignment based on community activity is largely consistent with human judgment. (c) Inter-annotator agreement measured by pairwise Cohen's $\kappa$ coefficients, showing a moderate level of consensus among annotators (mean $\kappa$ = 0.576).
  • ...and 17 more figures