Table of Contents
Fetching ...

Are Large Language Models (LLMs) Good Social Predictors?

Kaiqi Yang, Hang Li, Hongzhi Wen, Tai-Quan Peng, Jiliang Tang, Hui Liu

TL;DR

This paper challenges the notion that large language models are robust social predictors by showing that strong performance in prior voting-prediction studies hinges on input shortcuts. It introduces Soc-PRF Prediction, a real-world, zero-shot evaluation using Gallup World Poll features split into low- and high-mutability groups, and demonstrates that LLMs perform roughly at chance without shortcuts. Through analyses of feature correlations and multiple evaluation settings, the work reveals limitations in relying on general input features for individual-level social predictions and discusses potential avenues, such as supervised signals, reference information, and feature enrichment, to enhance LLM capabilities. The findings highlight the gap between population-level linguistic knowledge in LLMs and individualized social prediction, with implications for methodology and ethics in deploying LLMs in social science research.

Abstract

The prediction has served as a crucial scientific method in modern social studies. With the recent advancement of Large Language Models (LLMs), efforts have been made to leverage LLMs to predict the human features in social life, such as presidential voting. These works suggest that LLMs are capable of generating human-like responses. However, we find that the promising performance achieved by previous studies is because of the existence of input shortcut features to the response. In fact, by removing these shortcuts, the performance is reduced dramatically. To further revisit the ability of LLMs, we introduce a novel social prediction task, Soc-PRF Prediction, which utilizes general features as input and simulates real-world social study settings. With the comprehensive investigations on various LLMs, we reveal that LLMs cannot work as expected on social prediction when given general input features without shortcuts. We further investigate possible reasons for this phenomenon that suggest potential ways to enhance LLMs for social prediction.

Are Large Language Models (LLMs) Good Social Predictors?

TL;DR

This paper challenges the notion that large language models are robust social predictors by showing that strong performance in prior voting-prediction studies hinges on input shortcuts. It introduces Soc-PRF Prediction, a real-world, zero-shot evaluation using Gallup World Poll features split into low- and high-mutability groups, and demonstrates that LLMs perform roughly at chance without shortcuts. Through analyses of feature correlations and multiple evaluation settings, the work reveals limitations in relying on general input features for individual-level social predictions and discusses potential avenues, such as supervised signals, reference information, and feature enrichment, to enhance LLM capabilities. The findings highlight the gap between population-level linguistic knowledge in LLMs and individualized social prediction, with implications for methodology and ethics in deploying LLMs in social science research.

Abstract

The prediction has served as a crucial scientific method in modern social studies. With the recent advancement of Large Language Models (LLMs), efforts have been made to leverage LLMs to predict the human features in social life, such as presidential voting. These works suggest that LLMs are capable of generating human-like responses. However, we find that the promising performance achieved by previous studies is because of the existence of input shortcut features to the response. In fact, by removing these shortcuts, the performance is reduced dramatically. To further revisit the ability of LLMs, we introduce a novel social prediction task, Soc-PRF Prediction, which utilizes general features as input and simulates real-world social study settings. With the comprehensive investigations on various LLMs, we reveal that LLMs cannot work as expected on social prediction when given general input features without shortcuts. We further investigate possible reasons for this phenomenon that suggest potential ways to enhance LLMs for social prediction.
Paper Structure (19 sections, 5 figures, 5 tables)

This paper contains 19 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: The performance of voting prediction. In this table, GPT stands for the LLM-based approach and we choose GPT 3.5 following argyle_out_2022. The Full indicates settings with all input features, and w/o shortcut stands for settings without the two shortcut features.
  • Figure 2: Correlation between features in the dataset. The metric is Cramer's V, and values close to 1 indicate strong correlations.
  • Figure 3: Performance of Random Forest and Random Guessing. The metric is AUC.
  • Figure 4: Performance of GPT 3.5 of setting high2high. The metric is AUC score. The sign "-" indicates no valid data, either because the input features (Y-axis) and output features (X-axis) share the same topic, or they are not conducted simultaneously in the survey.
  • Figure 5: Distributions of Social Features. Note that the last two features (civic engagement behaviors) are more mutable than the first two.