MindVote: When AI Meets the Wild West of Social Media Opinion
Xutao Mao, Ezra Xuanru Tao, Leyao Wang
TL;DR
MindVote tackles the misalignment between survey-based evaluation of large language models (LLMs) and the context-rich formation of public opinion on social media. It introduces a benchmark built from 3,918 naturalistic polls across Reddit and Weibo, enriched with platform-, topic-, and time-context metadata, to predict full opinion distributions rather than a single majority. Evaluating 15 LLMs with a JSON-based prompting scheme and four metrics ($1$-Wasserstein Distance, $1$-KL Divergence, Spearman's $\rho$, and accuracy), the study shows that context-aware reasoning is crucial and that fine-tuning on sanitized survey data underperforms in real-world social contexts. The results advocate for models capable of leveraging social context over pattern-matching, establishing MindVote as a practical framework to develop and assess more socially intelligent AI systems.
Abstract
Large Language Models (LLMs) are increasingly used as scalable tools for pilot testing, predicting public opinion distributions before deploying costly surveys. To serve as effective pilot testing tools, the performance of these LLMs is typically benchmarked against their ability to reproduce the outcomes of past structured surveys. This evaluation paradigm, however, is misaligned with the dynamic, context-rich social media environments where public opinion is increasingly formed and expressed. By design, surveys strip away the social, cultural, and temporal context that shapes public opinion, and LLM benchmarks built on this paradigm inherit these critical limitations. To bridge this gap, we introduce MindVote, the first benchmark for public opinion distribution prediction grounded in authentic social media discourse. MindVote is constructed from 3,918 naturalistic polls sourced from Reddit and Weibo, spanning 23 topics and enriched with detailed annotations for platform, topical, and temporal context. Using this benchmark, we conduct a comprehensive evaluation of 15 LLMs. MindVote provides a robust, ecologically valid framework to move beyond survey-based evaluations and advance the development of more socially intelligent AI systems.
