Masculine Defaults via Gendered Discourse in Podcasts and Large Language Models
Maria Teleki, Xiangjue Dong, Haoran Liu, James Caverlee
TL;DR
This work investigates masculine defaults in discourse by analyzing podcasts and Large Language Models. It introduces the Gendered Discourse Correlation Framework ($GDCF$) to automatically discover gendered discourse words from spoken content and the Discourse Word-Embedding Association Test ($D-WEAT$) to quantify their bias in LLM embeddings. Using 15,117 Spotify podcast episodes, the study shows masculine discourse words correlate with domains that reward masculine traits (e.g., business and technology/politics) and that LLM embeddings encode these words with greater stability, signaling a representational harm and a masculine default. The authors provide data, code, and methodological tools to audit and potentially mitigate discourse-based gender bias in downstream systems, with implications for policy, ethics, and responsible AI deployment.
Abstract
Masculine defaults are widely recognized as a significant type of gender bias, but they are often unseen as they are under-researched. Masculine defaults involve three key parts: (i) the cultural context, (ii) the masculine characteristics or behaviors, and (iii) the reward for, or simply acceptance of, those masculine characteristics or behaviors. In this work, we study discourse-based masculine defaults, and propose a twofold framework for (i) the large-scale discovery and analysis of gendered discourse words in spoken content via our Gendered Discourse Correlation Framework (GDCF); and (ii) the measurement of the gender bias associated with these gendered discourse words in LLMs via our Discourse Word-Embedding Association Test (D-WEAT). We focus our study on podcasts, a popular and growing form of social media, analyzing 15,117 podcast episodes. We analyze correlations between gender and discourse words -- discovered via LDA and BERTopic -- to automatically form gendered discourse word lists. We then study the prevalence of these gendered discourse words in domain-specific contexts, and find that gendered discourse-based masculine defaults exist in the domains of business, technology/politics, and video games. Next, we study the representation of these gendered discourse words from a state-of-the-art LLM embedding model from OpenAI, and find that the masculine discourse words have a more stable and robust representation than the feminine discourse words, which may result in better system performance on downstream tasks for men. Hence, men are rewarded for their discourse patterns with better system performance by one of the state-of-the-art language models -- and this embedding disparity is a representational harm and a masculine default.
