Table of Contents
Fetching ...

On the Mutual Influence of Gender and Occupation in LLM Representations

Haozhe An, Connor Baumler, Abhilasha Sancheti, Rachel Rudinger

TL;DR

This paper investigates how LLM representations of gender for first names interact with occupational contexts, proposing that gender is encoded along a latent direction whose orientation can shift with context. By deriving a gender direction from gendered word embeddings and validating it across four open-source LLMs, the authors show that first-name femininity in embeddings correlates with real-world gender statistics and changes when occupations are mentioned. In a downstream occupation-prediction task, internal gender representations partially explain biased predictions, with stronger alignment when names’ perceived gender matches occupation stereotypes, yet intrinsic and extrinsic biases do not always align. The work highlights the interpretability of internal gender representations, but also emphasizes limitations in predicting or mitigating bias, calling for broader demographic coverage, larger-model analyses, and future mitigation strategies. Overall, the study advances understanding of how gender-occupation stereotypes arise in LLMs and points to directions for more inclusive and responsible language technologies, formalizing a link between internal representations and extrinsic biased behavior using $P_{ ext{prior}}( ext{Female})$, DOT($ vec{n}_{ ext{wiki}}$, $ vec{g}$), and related metrics.

Abstract

We examine LLM representations of gender for first names in various occupational contexts to study how occupations and the gender perception of first names in LLMs influence each other mutually. We find that LLMs' first-name gender representations correlate with real-world gender statistics associated with the name, and are influenced by the co-occurrence of stereotypically feminine or masculine occupations. Additionally, we study the influence of first-name gender representations on LLMs in a downstream occupation prediction task and their potential as an internal metric to identify extrinsic model biases. While feminine first-name embeddings often raise the probabilities for female-dominated jobs (and vice versa for male-dominated jobs), reliably using these internal gender representations for bias detection remains challenging.

On the Mutual Influence of Gender and Occupation in LLM Representations

TL;DR

This paper investigates how LLM representations of gender for first names interact with occupational contexts, proposing that gender is encoded along a latent direction whose orientation can shift with context. By deriving a gender direction from gendered word embeddings and validating it across four open-source LLMs, the authors show that first-name femininity in embeddings correlates with real-world gender statistics and changes when occupations are mentioned. In a downstream occupation-prediction task, internal gender representations partially explain biased predictions, with stronger alignment when names’ perceived gender matches occupation stereotypes, yet intrinsic and extrinsic biases do not always align. The work highlights the interpretability of internal gender representations, but also emphasizes limitations in predicting or mitigating bias, calling for broader demographic coverage, larger-model analyses, and future mitigation strategies. Overall, the study advances understanding of how gender-occupation stereotypes arise in LLMs and points to directions for more inclusive and responsible language technologies, formalizing a link between internal representations and extrinsic biased behavior using , DOT(, ), and related metrics.

Abstract

We examine LLM representations of gender for first names in various occupational contexts to study how occupations and the gender perception of first names in LLMs influence each other mutually. We find that LLMs' first-name gender representations correlate with real-world gender statistics associated with the name, and are influenced by the co-occurrence of stereotypically feminine or masculine occupations. Additionally, we study the influence of first-name gender representations on LLMs in a downstream occupation prediction task and their potential as an internal metric to identify extrinsic model biases. While feminine first-name embeddings often raise the probabilities for female-dominated jobs (and vice versa for male-dominated jobs), reliably using these internal gender representations for bias detection remains challenging.

Paper Structure

This paper contains 46 sections, 1 equation, 9 figures, 4 tables.

Figures (9)

  • Figure 1: We derive first-name gender representations in LLMs by projecting their contextualized embeddings onto an approximated gender direction. We find that these representations shift with the occupational context, e.g., "nurse" ($90.9\%$ female) increases femininity, while "comedian" ($21.1\%$ female) skews masculinity. We also examine how these gender representations correlate with biased behavior in downstream occupation prediction.
  • Figure 2: The percentage of variance explained in the principal components as a result of applying PCA to the differences between gendered (or random) word embeddings from various models. These results indicate that the first PC primarily captures the gender subspace in the respective LLM embedding space.
  • Figure 3: Scatter plot between each pair of the three variables studied in \ref{['sec:align_rep_and_world']} and their Pearson correlation. We observe statistically significant linear correlations between each pair of the variables studied. Both the model's prior gender probability and the embedding associated with a name reflect the real-world gender distribution.
  • Figure 4: (Left of each subfigure) Change of the dot product between the name embedding from a template sentence $\vec{n}_{\text{temp}}$ and the gender direction $\vec{g}$ before and after the mention of an occupation. (Right of each subfigure) Change of the output probability of the token "female" with and without mentioning an occupation. "% Female Name" is the real-world gender distribution of a name (\ref{['sec:align_rep_and_world']}). "% Female Bios" is the percentage of biographies of female individuals in dearteaga-2019-biasinbios, which mirrors the gender breakdown of an occupation in real life. The violet star indicates the non-stereotypical baseline where the occupation placeholder is replaced with the string "person." We observe that the gender representation of first names generally shifts with occupational contexts, where, within each gender bucket along the horizontal axis, stereotypically female jobs lead to a more positive dot product along the gender direction and a higher predicted probability for the female gender. We also see that the results for strongly masculine or feminine names are less affected by occupation than those for gender-ambiguous names.
  • Figure 5: (a) and (c):Llama-3.1-8B shows higher TPR for masculine names in the male-dominated occupation "pastor" but lower TPR in the female-dominated occupation "dietitian." The Pearson correlation in these plots represents the Bias Coefficients. (b) and (d): In Llama-3.1-8B, a more masculine first name increases the probability of "pastor" while feminine names have higher probabilities for "dietitian," partly explaining the TPR gap. The Spearman correlation represents the Internal Coefficients (defined in \ref{['sec:bias_explanation']}). Red dashed and black dotted lines show values when the first name is anonymized as "X."
  • ...and 4 more figures