Table of Contents
Fetching ...

Are Large Language Models Chameleons? An Attempt to Simulate Social Surveys

Mingmeng Geng, Sihong He, Roberto Trotta

TL;DR

A comparison of different LLM responses with the European Social Survey data suggests that the effect of prompts on bias and variability is fundamental, highlighting major cultural, age, and gender biases.

Abstract

Can large language models (LLMs) simulate social surveys? To answer this question, we conducted millions of simulations in which LLMs were asked to answer subjective questions. A comparison of different LLM responses with the European Social Survey (ESS) data suggests that the effect of prompts on bias and variability is fundamental, highlighting major cultural, age, and gender biases. We further discussed statistical methods for measuring the difference between LLM answers and survey data and proposed a novel measure inspired by Jaccard similarity, as LLM-generated responses are likely to have a smaller variance. Our experiments also reveal that it is important to analyze the robustness and variability of prompts before using LLMs to simulate social surveys, as their imitation abilities are approximate at best.

Are Large Language Models Chameleons? An Attempt to Simulate Social Surveys

TL;DR

A comparison of different LLM responses with the European Social Survey data suggests that the effect of prompts on bias and variability is fundamental, highlighting major cultural, age, and gender biases.

Abstract

Can large language models (LLMs) simulate social surveys? To answer this question, we conducted millions of simulations in which LLMs were asked to answer subjective questions. A comparison of different LLM responses with the European Social Survey (ESS) data suggests that the effect of prompts on bias and variability is fundamental, highlighting major cultural, age, and gender biases. We further discussed statistical methods for measuring the difference between LLM answers and survey data and proposed a novel measure inspired by Jaccard similarity, as LLM-generated responses are likely to have a smaller variance. Our experiments also reveal that it is important to analyze the robustness and variability of prompts before using LLMs to simulate social surveys, as their imitation abilities are approximate at best.
Paper Structure (50 sections, 2 equations, 9 figures, 5 tables)

This paper contains 50 sections, 2 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: LLM simulation with different prompts.
  • Figure 2: Q1: "Gays and lesbians free to live life as they wish"? Prompt: P1. The points represent the mean and the error bars represent the standard deviation (and the same for the next figures). Model: GPT-3.5.
  • Figure 3: Q2: "Government should reduce differences in income levels"? Prompt: P1. Model: GPT-3.5.
  • Figure 4: Q1: "Gays and lesbians free to live life as they wish"? Prompt: P2. Model: GPT-3.5.
  • Figure 5: Comparisons between survey data and simulation results based on GPT-3.5.
  • ...and 4 more figures