Are Large Language Models Chameleons? An Attempt to Simulate Social Surveys

Mingmeng Geng; Sihong He; Roberto Trotta

Are Large Language Models Chameleons? An Attempt to Simulate Social Surveys

Mingmeng Geng, Sihong He, Roberto Trotta

TL;DR

A comparison of different LLM responses with the European Social Survey data suggests that the effect of prompts on bias and variability is fundamental, highlighting major cultural, age, and gender biases.

Abstract

Can large language models (LLMs) simulate social surveys? To answer this question, we conducted millions of simulations in which LLMs were asked to answer subjective questions. A comparison of different LLM responses with the European Social Survey (ESS) data suggests that the effect of prompts on bias and variability is fundamental, highlighting major cultural, age, and gender biases. We further discussed statistical methods for measuring the difference between LLM answers and survey data and proposed a novel measure inspired by Jaccard similarity, as LLM-generated responses are likely to have a smaller variance. Our experiments also reveal that it is important to analyze the robustness and variability of prompts before using LLMs to simulate social surveys, as their imitation abilities are approximate at best.

Are Large Language Models Chameleons? An Attempt to Simulate Social Surveys

TL;DR

Abstract

Paper Structure (50 sections, 2 equations, 9 figures, 5 tables)

This paper contains 50 sections, 2 equations, 9 figures, 5 tables.

Introduction
Related work
LLM simulation
LLM bias
LLM evaluation
LLM alignment
Prompt engineering
Data
Methods
Simulations settings
Models
Prompts
Responses
Simplifications
Parameters
...and 35 more sections

Figures (9)

Figure 1: LLM simulation with different prompts.
Figure 2: Q1: "Gays and lesbians free to live life as they wish"? Prompt: P1. The points represent the mean and the error bars represent the standard deviation (and the same for the next figures). Model: GPT-3.5.
Figure 3: Q2: "Government should reduce differences in income levels"? Prompt: P1. Model: GPT-3.5.
Figure 4: Q1: "Gays and lesbians free to live life as they wish"? Prompt: P2. Model: GPT-3.5.
Figure 5: Comparisons between survey data and simulation results based on GPT-3.5.
...and 4 more figures

Are Large Language Models Chameleons? An Attempt to Simulate Social Surveys

TL;DR

Abstract

Are Large Language Models Chameleons? An Attempt to Simulate Social Surveys

Authors

TL;DR

Abstract

Table of Contents

Figures (9)