Table of Contents
Fetching ...

Hi, my name is Martha: Using names to measure and mitigate bias in generative dialogue models

Eric Michael Smith, Adina Williams

TL;DR

This paper investigates how names used as prompts reveal gender and race/ethnicity biases in generative dialogue models. It introduces a pragmatic, template-based framework to measure bias via self-chats between model copies and evaluates three debiasing methods: name scrambling, controlled generation, and unlikelihood training. Across BlenderBot and DialoGPT, it shows that larger models exhibit stronger bias and that the proposed mitigations can substantially reduce bias while maintaining or enhancing perceived conversational quality. The work highlights practical trade-offs among debiasing approaches and emphasizes the need for intersectional bias measurement and broader ethical considerations in deployed dialogue systems.

Abstract

All AI models are susceptible to learning biases in data that they are trained on. For generative dialogue models, being trained on real human conversations containing unbalanced gender and race/ethnicity references can lead to models that display learned biases, which we define here broadly as any measurable differences in the distributions of words or semantic content of conversations based on demographic groups. We measure the strength of such biases by producing artificial conversations between two copies of a dialogue model, conditioning one conversational partner to state a name commonly associated with a certain gender and/or race/ethnicity. We find that larger capacity models tend to exhibit more gender bias and greater stereotyping of occupations by gender. We show that several methods of tuning these dialogue models, specifically name scrambling, controlled generation, and unlikelihood training, are effective in reducing bias in conversation, including on a downstream conversational task. Name scrambling is also effective in lowering differences in token usage across conversations where partners have names associated with different genders or races/ethnicities.

Hi, my name is Martha: Using names to measure and mitigate bias in generative dialogue models

TL;DR

This paper investigates how names used as prompts reveal gender and race/ethnicity biases in generative dialogue models. It introduces a pragmatic, template-based framework to measure bias via self-chats between model copies and evaluates three debiasing methods: name scrambling, controlled generation, and unlikelihood training. Across BlenderBot and DialoGPT, it shows that larger models exhibit stronger bias and that the proposed mitigations can substantially reduce bias while maintaining or enhancing perceived conversational quality. The work highlights practical trade-offs among debiasing approaches and emphasizes the need for intersectional bias measurement and broader ethical considerations in deployed dialogue systems.

Abstract

All AI models are susceptible to learning biases in data that they are trained on. For generative dialogue models, being trained on real human conversations containing unbalanced gender and race/ethnicity references can lead to models that display learned biases, which we define here broadly as any measurable differences in the distributions of words or semantic content of conversations based on demographic groups. We measure the strength of such biases by producing artificial conversations between two copies of a dialogue model, conditioning one conversational partner to state a name commonly associated with a certain gender and/or race/ethnicity. We find that larger capacity models tend to exhibit more gender bias and greater stereotyping of occupations by gender. We show that several methods of tuning these dialogue models, specifically name scrambling, controlled generation, and unlikelihood training, are effective in reducing bias in conversation, including on a downstream conversational task. Name scrambling is also effective in lowering differences in token usage across conversations where partners have names associated with different genders or races/ethnicities.

Paper Structure

This paper contains 28 sections, 2 figures, 15 tables.

Figures (2)

  • Figure 1: Gender breakdown of Speaker A's assigned name when a certain occupation is mentioned in a BlenderBot3B self-chat, plotted against the gender ratio of that occupation in the US workforce, as listed by the U.S. Bureau of Labor Statistics. The top 4 occupations most overindexed in woman-name conversations and in man-name conversations are annotated.
  • Figure 2: Gender bias as a function of speaker (A vs. B) and turn, measured in self-chats for various sizes of BlenderBot. Gender-classifier bias is defined as in Table \ref{['table:bias_by_name_genderedness']}. Bias tends to be larger for larger models, as well as earlier on in the conversation (i.e., closer to turn A1 when Speaker A states their templated name).