Gender Biases in Error Mitigation by Voice Assistants

Amama Mahmood; Chien-Ming Huang

Gender Biases in Error Mitigation by Voice Assistants

Amama Mahmood, Chien-Ming Huang

TL;DR

The paper investigates how voice gender (feminine, ambiguous, masculine) and error-mitigation strategies (apology vs. compensation) interact with user gender to influence perceptions and behaviors toward AI voice assistants. Using a Wizard-of-Oz mock smart speaker across six shopping tasks (N=40 analyzed), it demonstrates that apologies evoke warmth while compensation boosts service-recovery satisfaction and perceived competence; feminine voices are generally perceived as warmer and more competent than masculine ones, with ambiguous voices offering bias-reduction potential. Male participants show more interruptions and biased behaviors, especially in response to certain voice/gender combinations, suggesting persistent sociocultural dynamics in human–AI interactions. The study highlights that ambiguous voices may mitigate gender biases in assistive contexts, informing design choices for more inclusive and effective voice assistants, while acknowledging limitations and avenues for real-world validation.

Abstract

Commercial voice assistants are largely feminized and associated with stereotypically feminine traits such as warmth and submissiveness. As these assistants continue to be adopted for everyday uses, it is imperative to understand how the portrayed gender shapes the voice assistant's ability to mitigate errors, which are still common in voice interactions. We report a study (N=40) that examined the effects of voice gender (feminine, ambiguous, masculine), error mitigation strategies (apology, compensation) and participant's gender on people's interaction behavior and perceptions of the assistant. Our results show that AI assistants that apologized appeared warmer than those offered compensation. Moreover, male participants preferred apologetic feminine assistants over apologetic masculine ones. Furthermore, male participants interrupted AI assistants regardless of perceived gender more frequently than female participants when errors occurred. Our results suggest that the perceived gender of a voice assistant biases user behavior, especially for male users, and that an ambiguous voice has the potential to reduce biases associated with gender-specific traits.

Gender Biases in Error Mitigation by Voice Assistants

TL;DR

Abstract

Paper Structure (45 sections, 5 figures, 4 tables)

This paper contains 45 sections, 5 figures, 4 tables.

Introduction
Background and Related Work
Error Mitigation Strategies in Human-Agent Interaction
Gender Stereotypes in Human-Agent Interaction
Gender Stereotypes in Human-Robot and Human-AI Interactions
Stereotypes Associated with Gendering of Voice Assistants
Intervention for Mitigating Gender Stereotypes and Biases
Gaps in Methodological Approach for Empirical Studies on Gender Biases
Hypotheses
Methods
Study Design and Experimental Task
Mock Smart Speaker Setup
Manipulation of Error Mitigation Strategies
Manipulation of Gendered Voices
Pilot Study 1: Picking 2 gender ambiguous voices from 4 candidates
...and 30 more sections

Figures (5)

Figure 1: Experimental setup. The participant interacts with the smart speaker using voice commands. The wired speaker is connected to the laptop. The LED lights ring on the speaker is activated by the Arduino once either of the two applications running on the laptop signals that assistant or participant speech is detected. The experimenter accesses the web interface on the laptop to control the speech of the mock smart speaker using TeamViewer on the Desktop (WoZ). The participant and the experimenter are in the same room separated by a physical partition.
Figure 2: Results of perceived gender and humalikeness of the assistant. Mixed-model repeated measures ANOVAs were conducted to discover effects of voice gender---ambiguous (Amb.), feminine (Fem.), masculine (Masc.)---, mitigation strategy---apology (Apol.), compensation (Comp.)---, and participant (PCP) gender---female, male--- on participants' perceptions of agent characteristics. All pairwise comparisons were conducted using Fisher's LSD method with Bonferroni correction. Error bars represent standard error (SE) and only the significant comparisons ($p<.05$) are highlighted.
Figure 3: Results of participants' service recovery satisfaction, and perceived warmth and competence of the assistant. Mixed-model repeated measures ANOVAs were conducted to discover effects of voice gender---ambiguous (Amb.), feminine (Fem.), masculine (Masc.)----, mitigation strategy---apology (Apol.), compensation (Comp.)---, and participant (PCP) gender---female, male--- on subjective measures. All pairwise comparisons were conducted using Fisher's LSD method with Bonferroni correction. Error bars represent standard error (SE) and only the significant comparisons ($p<.05$) are highlighted.
Figure 4: Results of interruption (Intrp.) related behavioral metrics. Generalized Estimating Equations (GEE) was used to fit a logistic regression to discover effects of voice gender, and participant gender on participants' interruption to error. Moreover, a mixed model ANOVA was conducted to study effects of voice gender---ambiguous, feminine, masculine----, mitigation strategy---apology (Apol.), compensation (Comp.)---, and participant (PCP) gender---female, male--- on number of interruptions. All pairwise comparisons were conducted using Tukey's HSD method. Error bars represent standard error (SE) and only the significant comparisons ($p<.05$) are highlighted.
Figure 5: Results of number of other non-verbal reactions (RXN). Mixed-model repeated measures ANOVA was conducted to discover effects of voice gender---ambiguous (Amb.), feminine (Fem.), masculine (Masc.)---, mitigation strategy---apology (Apol.), compensation (Comp.)---, and participant (PCP) gender---female, male--- on number of other non-verbal reactions. All pairwise comparisons were conducted using Tukey's HSD method. Error bars represent standard error (SE) and only the significant comparisons ($p<.05$) are highlighted.

Gender Biases in Error Mitigation by Voice Assistants

TL;DR

Abstract

Gender Biases in Error Mitigation by Voice Assistants

Authors

TL;DR

Abstract

Table of Contents

Figures (5)