Table of Contents
Fetching ...

Popular LLMs Amplify Race and Gender Disparities in Human Mobility

Xinhua Wu, Qi R. Wang

Abstract

As large language models (LLMs) are increasingly applied in areas influencing societal outcomes, it is critical to understand their tendency to perpetuate and amplify biases. This study investigates whether LLMs exhibit biases in predicting human mobility -- a fundamental human behavior -- based on race and gender. Using three prominent LLMs -- GPT-4, Gemini, and Claude -- we analyzed their predictions of visitations to points of interest (POIs) for individuals, relying on prompts that included names with and without explicit demographic details. We find that LLMs frequently reflect and amplify existing societal biases. Specifically, predictions for minority groups were disproportionately skewed, with these individuals being significantly less likely to be associated with wealth-related points of interest (POIs). Gender biases were also evident, as female individuals were consistently linked to fewer career-related POIs compared to their male counterparts. These biased associations suggest that LLMs not only mirror but also exacerbate societal stereotypes, particularly in contexts involving race and gender.

Popular LLMs Amplify Race and Gender Disparities in Human Mobility

Abstract

As large language models (LLMs) are increasingly applied in areas influencing societal outcomes, it is critical to understand their tendency to perpetuate and amplify biases. This study investigates whether LLMs exhibit biases in predicting human mobility -- a fundamental human behavior -- based on race and gender. Using three prominent LLMs -- GPT-4, Gemini, and Claude -- we analyzed their predictions of visitations to points of interest (POIs) for individuals, relying on prompts that included names with and without explicit demographic details. We find that LLMs frequently reflect and amplify existing societal biases. Specifically, predictions for minority groups were disproportionately skewed, with these individuals being significantly less likely to be associated with wealth-related points of interest (POIs). Gender biases were also evident, as female individuals were consistently linked to fewer career-related POIs compared to their male counterparts. These biased associations suggest that LLMs not only mirror but also exacerbate societal stereotypes, particularly in contexts involving race and gender.

Paper Structure

This paper contains 3 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Design and results of Experiment I.(a) An illustrative example of Experiment I. The left panel shows the prompt and output when only the name "John" is provided, resulting in a prediction of "Grocery store." In contrast, the right panel adds explicit demographic information ("John, a white male") to the prompt. This sometimes leads to a different prediction, e.g., "Career consultation center." (b) GPT-4o predicted probability of visiting each of four categories of POIs (career-related, everyday-needs, wealth-related, and poverty-related) across eight demographic groups. Results of Gemini-1.5-pro and Claude-3.5-sonnet can be found in Figure S1.
  • Figure 2: Logistic regression coefficients showing the influence of name-based and explicit demographic features (race, gender) on LLM predictions of POI visits. The rows represent these features: Female_ratio (Name), Black_ratio (Name), Hispanic_ratio (Name), Asian_ratio (Name), Black, Hispanic, Asian, Female. The first three columns represent the coefficients from the first model without explicit demographic features, while the last three columns show the coefficients from the second one with explicit race and gender labels. The colorbar indicates the strength and direction of the association (blue: negative, gray: neutral, red: positive), with white representing statistically insignificant coefficients.
  • Figure 3: Design and results of Experiment II.(a) An illustrative example of Experiment II. The left panel shows the prompt and output when only the names "John" and "Maricela" are provided, resulting in predictions that "John" visited "Resort" and "Soup Kitchen," while "Maricela" visited "Art Gallery" and "Thrift Store." In contrast, the right panel adds explicit demographic information to the prompt. This leads to a different prediction for "Maricela" (she now visits "Soup Kitchen" and "Thrift Store"). (b) Distribution of POI predictions across different demographic subgroups by GPT-4o. Each cell represents the percentage probability that the corresponding POI visit is from a person in that demographic group (row sum euquals to 100%). The colorbar indicates the proportion, ranging from 0% (blue) to 25% (red).
  • Figure 4: LLMs' predicted disparities in gender and race.(a) Predicted percentage of visits to career-related POIs with and without explicit gender information using GPT4o. The blue bars represent predictions based on the individual's name alone ("Gender not specified"), while the orange bars represent predictions when gender is explicitly included in the prompt ("Female specified"). (b) Predicted percentage of visits to wealth-related POIs with and without explicit racial information using GPT4o. The red bars represent predictions based on the individual's name alone ("Race not specified"), while the green bars represent predictions when race is explicitly included in the prompt ("Race specified").
  • Figure S1: POI visit distribution from large language models in Experiment I. Rejected responses are not included.
  • ...and 2 more figures