Hi, my name is Martha: Using names to measure and mitigate bias in generative dialogue models
Eric Michael Smith, Adina Williams
TL;DR
This paper investigates how names used as prompts reveal gender and race/ethnicity biases in generative dialogue models. It introduces a pragmatic, template-based framework to measure bias via self-chats between model copies and evaluates three debiasing methods: name scrambling, controlled generation, and unlikelihood training. Across BlenderBot and DialoGPT, it shows that larger models exhibit stronger bias and that the proposed mitigations can substantially reduce bias while maintaining or enhancing perceived conversational quality. The work highlights practical trade-offs among debiasing approaches and emphasizes the need for intersectional bias measurement and broader ethical considerations in deployed dialogue systems.
Abstract
All AI models are susceptible to learning biases in data that they are trained on. For generative dialogue models, being trained on real human conversations containing unbalanced gender and race/ethnicity references can lead to models that display learned biases, which we define here broadly as any measurable differences in the distributions of words or semantic content of conversations based on demographic groups. We measure the strength of such biases by producing artificial conversations between two copies of a dialogue model, conditioning one conversational partner to state a name commonly associated with a certain gender and/or race/ethnicity. We find that larger capacity models tend to exhibit more gender bias and greater stereotyping of occupations by gender. We show that several methods of tuning these dialogue models, specifically name scrambling, controlled generation, and unlikelihood training, are effective in reducing bias in conversation, including on a downstream conversational task. Name scrambling is also effective in lowering differences in token usage across conversations where partners have names associated with different genders or races/ethnicities.
