Table of Contents
Fetching ...

Masculine Defaults via Gendered Discourse in Podcasts and Large Language Models

Maria Teleki, Xiangjue Dong, Haoran Liu, James Caverlee

TL;DR

This work investigates masculine defaults in discourse by analyzing podcasts and Large Language Models. It introduces the Gendered Discourse Correlation Framework ($GDCF$) to automatically discover gendered discourse words from spoken content and the Discourse Word-Embedding Association Test ($D-WEAT$) to quantify their bias in LLM embeddings. Using 15,117 Spotify podcast episodes, the study shows masculine discourse words correlate with domains that reward masculine traits (e.g., business and technology/politics) and that LLM embeddings encode these words with greater stability, signaling a representational harm and a masculine default. The authors provide data, code, and methodological tools to audit and potentially mitigate discourse-based gender bias in downstream systems, with implications for policy, ethics, and responsible AI deployment.

Abstract

Masculine defaults are widely recognized as a significant type of gender bias, but they are often unseen as they are under-researched. Masculine defaults involve three key parts: (i) the cultural context, (ii) the masculine characteristics or behaviors, and (iii) the reward for, or simply acceptance of, those masculine characteristics or behaviors. In this work, we study discourse-based masculine defaults, and propose a twofold framework for (i) the large-scale discovery and analysis of gendered discourse words in spoken content via our Gendered Discourse Correlation Framework (GDCF); and (ii) the measurement of the gender bias associated with these gendered discourse words in LLMs via our Discourse Word-Embedding Association Test (D-WEAT). We focus our study on podcasts, a popular and growing form of social media, analyzing 15,117 podcast episodes. We analyze correlations between gender and discourse words -- discovered via LDA and BERTopic -- to automatically form gendered discourse word lists. We then study the prevalence of these gendered discourse words in domain-specific contexts, and find that gendered discourse-based masculine defaults exist in the domains of business, technology/politics, and video games. Next, we study the representation of these gendered discourse words from a state-of-the-art LLM embedding model from OpenAI, and find that the masculine discourse words have a more stable and robust representation than the feminine discourse words, which may result in better system performance on downstream tasks for men. Hence, men are rewarded for their discourse patterns with better system performance by one of the state-of-the-art language models -- and this embedding disparity is a representational harm and a masculine default.

Masculine Defaults via Gendered Discourse in Podcasts and Large Language Models

TL;DR

This work investigates masculine defaults in discourse by analyzing podcasts and Large Language Models. It introduces the Gendered Discourse Correlation Framework () to automatically discover gendered discourse words from spoken content and the Discourse Word-Embedding Association Test () to quantify their bias in LLM embeddings. Using 15,117 Spotify podcast episodes, the study shows masculine discourse words correlate with domains that reward masculine traits (e.g., business and technology/politics) and that LLM embeddings encode these words with greater stability, signaling a representational harm and a masculine default. The authors provide data, code, and methodological tools to audit and potentially mitigate discourse-based gender bias in downstream systems, with implications for policy, ethics, and responsible AI deployment.

Abstract

Masculine defaults are widely recognized as a significant type of gender bias, but they are often unseen as they are under-researched. Masculine defaults involve three key parts: (i) the cultural context, (ii) the masculine characteristics or behaviors, and (iii) the reward for, or simply acceptance of, those masculine characteristics or behaviors. In this work, we study discourse-based masculine defaults, and propose a twofold framework for (i) the large-scale discovery and analysis of gendered discourse words in spoken content via our Gendered Discourse Correlation Framework (GDCF); and (ii) the measurement of the gender bias associated with these gendered discourse words in LLMs via our Discourse Word-Embedding Association Test (D-WEAT). We focus our study on podcasts, a popular and growing form of social media, analyzing 15,117 podcast episodes. We analyze correlations between gender and discourse words -- discovered via LDA and BERTopic -- to automatically form gendered discourse word lists. We then study the prevalence of these gendered discourse words in domain-specific contexts, and find that gendered discourse-based masculine defaults exist in the domains of business, technology/politics, and video games. Next, we study the representation of these gendered discourse words from a state-of-the-art LLM embedding model from OpenAI, and find that the masculine discourse words have a more stable and robust representation than the feminine discourse words, which may result in better system performance on downstream tasks for men. Hence, men are rewarded for their discourse patterns with better system performance by one of the state-of-the-art language models -- and this embedding disparity is a representational harm and a masculine default.

Paper Structure

This paper contains 42 sections, 6 equations, 7 figures, 13 tables.

Figures (7)

  • Figure 1: An overview of our two-part framework: (i) Using our Gendered Discourse Correlation Framework (GDCF, as shown in Figure \ref{['fig:BigDiagram']}), we obtain gendered discourse word lists. (ii) We then perform our Discourse Word-Embedding Association Test (D-WEAT, as shown here in Figure \ref{['fig:embds']}). We form parallel sentences, $s$ and $s'$, by swapping masculine discourse words (e.g. "going") for feminine discourse words (e.g. "like"): $s\!=\,$And I was going, hey, it's cold outside..., and $s'\!=\,$And I was like, hey, it's cold outside... We find that the masculine discourse words have a more stable embedding representation -- this is a representational harm and a masculine default.
  • Figure 2: GDCF (Gendered Discourse Correlation Framework) Diagram: Testing for correlations with an example of a significant correlation and an insignificant correlation -- all $(\Vec{f_i},\Vec{f_j})$ pairs are labeled significant or insignificant. $\lvert \Vec{f_i} \rvert = 15,117$ podcast episodes. $z=\binom{124}{2}=7,626$ correlation tests for the 124 total feature vectors.
  • Figure 3: Impact of $\tau$ on the average percentage of $S_w$ segments which move closer to the women concept ($A_w$) versus the men ($A_m$) concept.
  • Figure 4: Impact of $\tau$ on the average percentage of $S_m$ segments which move closer to the women concept ($A_w$) versus the men ($A_m$) concept.
  • Figure 5: Impact of $\gamma$ on the average percentage of $S_w$ segments which move closer to the women concept ($A_w$) versus the men ($A_m$) concept.
  • ...and 2 more figures