Are Large Language Models (LLMs) Good Social Predictors?
Kaiqi Yang, Hang Li, Hongzhi Wen, Tai-Quan Peng, Jiliang Tang, Hui Liu
TL;DR
This paper challenges the notion that large language models are robust social predictors by showing that strong performance in prior voting-prediction studies hinges on input shortcuts. It introduces Soc-PRF Prediction, a real-world, zero-shot evaluation using Gallup World Poll features split into low- and high-mutability groups, and demonstrates that LLMs perform roughly at chance without shortcuts. Through analyses of feature correlations and multiple evaluation settings, the work reveals limitations in relying on general input features for individual-level social predictions and discusses potential avenues, such as supervised signals, reference information, and feature enrichment, to enhance LLM capabilities. The findings highlight the gap between population-level linguistic knowledge in LLMs and individualized social prediction, with implications for methodology and ethics in deploying LLMs in social science research.
Abstract
The prediction has served as a crucial scientific method in modern social studies. With the recent advancement of Large Language Models (LLMs), efforts have been made to leverage LLMs to predict the human features in social life, such as presidential voting. These works suggest that LLMs are capable of generating human-like responses. However, we find that the promising performance achieved by previous studies is because of the existence of input shortcut features to the response. In fact, by removing these shortcuts, the performance is reduced dramatically. To further revisit the ability of LLMs, we introduce a novel social prediction task, Soc-PRF Prediction, which utilizes general features as input and simulates real-world social study settings. With the comprehensive investigations on various LLMs, we reveal that LLMs cannot work as expected on social prediction when given general input features without shortcuts. We further investigate possible reasons for this phenomenon that suggest potential ways to enhance LLMs for social prediction.
