Representation Bias in Political Sample Simulations with Large Language Models
Weihong Qi, Hanjia Lyu, Jiebo Luo
TL;DR
The paper addresses representation bias in using LLMs to simulate political samples, focusing on vote choice and public opinion. It applies GPT-3.5-Turbo to data from ANES, GLES, Zuobiao, and CFPS, evaluating biases across language, demographics, and regime type via an Agreement Score $\text{Agreement Score} = \frac{\sum_i S_{i,\text{Agree}}}{S_{\text{total}}}$ with $S_{i,\text{Agree}}$ indicating match quality. Key findings show higher accuracy for vote choice than for public opinion, with stronger performance in English-speaking, bipartisan, and democratic contexts and with older age groups; non-English, multi-party, and autocratic contexts yield poorer results, especially for Chinese samples. The results highlight biases in AI-driven social science simulations and motivate diversified multilingual training data and methodological improvements to enhance fairness across political contexts.
Abstract
This study seeks to identify and quantify biases in simulating political samples with Large Language Models, specifically focusing on vote choice and public opinion. Using the GPT-3.5-Turbo model, we leverage data from the American National Election Studies, German Longitudinal Election Study, Zuobiao Dataset, and China Family Panel Studies to simulate voting behaviors and public opinions. This methodology enables us to examine three types of representation bias: disparities based on the the country's language, demographic groups, and political regime types. The findings reveal that simulation performance is generally better for vote choice than for public opinions, more accurate in English-speaking countries, more effective in bipartisan systems than in multi-partisan systems, and stronger in democratic settings than in authoritarian regimes. These results contribute to enhancing our understanding and developing strategies to mitigate biases in AI applications within the field of computational social science.
