On the Mutual Influence of Gender and Occupation in LLM Representations
Haozhe An, Connor Baumler, Abhilasha Sancheti, Rachel Rudinger
TL;DR
This paper investigates how LLM representations of gender for first names interact with occupational contexts, proposing that gender is encoded along a latent direction whose orientation can shift with context. By deriving a gender direction from gendered word embeddings and validating it across four open-source LLMs, the authors show that first-name femininity in embeddings correlates with real-world gender statistics and changes when occupations are mentioned. In a downstream occupation-prediction task, internal gender representations partially explain biased predictions, with stronger alignment when namesâ perceived gender matches occupation stereotypes, yet intrinsic and extrinsic biases do not always align. The work highlights the interpretability of internal gender representations, but also emphasizes limitations in predicting or mitigating bias, calling for broader demographic coverage, larger-model analyses, and future mitigation strategies. Overall, the study advances understanding of how gender-occupation stereotypes arise in LLMs and points to directions for more inclusive and responsible language technologies, formalizing a link between internal representations and extrinsic biased behavior using $P_{ ext{prior}}( ext{Female})$, DOT($ vec{n}_{ ext{wiki}}$, $ vec{g}$), and related metrics.
Abstract
We examine LLM representations of gender for first names in various occupational contexts to study how occupations and the gender perception of first names in LLMs influence each other mutually. We find that LLMs' first-name gender representations correlate with real-world gender statistics associated with the name, and are influenced by the co-occurrence of stereotypically feminine or masculine occupations. Additionally, we study the influence of first-name gender representations on LLMs in a downstream occupation prediction task and their potential as an internal metric to identify extrinsic model biases. While feminine first-name embeddings often raise the probabilities for female-dominated jobs (and vice versa for male-dominated jobs), reliably using these internal gender representations for bias detection remains challenging.
