Table of Contents
Fetching ...

Greater accessibility can amplify discrimination in generative AI

Carolin Holtermann, Minh Duc Bui, Kaitlyn Zhou, Valentin Hofmann, Katharina von der Wense, Anne Lauscher

Abstract

Hundreds of millions of people rely on large language models (LLMs) for education, work, and even healthcare. Yet these models are known to reproduce and amplify social biases present in their training data. Moreover, text-based interfaces remain a barrier for many, for example, users with limited literacy, motor impairments, or mobile-only devices. Voice interaction promises to expand accessibility, but unlike text, speech carries identity cues that users cannot easily mask, raising concerns about whether accessibility gains may come at the cost of equitable treatment. Here we show that audio-enabled LLMs exhibit systematic gender discrimination, shifting responses toward gender-stereotyped adjectives and occupations solely on the basis of speaker voice, and amplifying bias beyond that observed in text-based interaction. Thus, voice interfaces do not merely extend text models to a new modality but introduce distinct bias mechanisms tied to paralinguistic cues. Complementary survey evidence ($n=1,000$) shows that infrequent chatbot users are most hesitant to undisclosed attribute inference and most likely to disengage when such practices are revealed. To demonstrate a potential mitigation strategy, we show that pitch manipulation can systematically regulate gender-discriminatory outputs. Overall, our findings reveal a critical tension in AI development: efforts to expand accessibility through voice interfaces simultaneously create new pathways for discrimination, demanding that fairness and accessibility be addressed in tandem.

Greater accessibility can amplify discrimination in generative AI

Abstract

Hundreds of millions of people rely on large language models (LLMs) for education, work, and even healthcare. Yet these models are known to reproduce and amplify social biases present in their training data. Moreover, text-based interfaces remain a barrier for many, for example, users with limited literacy, motor impairments, or mobile-only devices. Voice interaction promises to expand accessibility, but unlike text, speech carries identity cues that users cannot easily mask, raising concerns about whether accessibility gains may come at the cost of equitable treatment. Here we show that audio-enabled LLMs exhibit systematic gender discrimination, shifting responses toward gender-stereotyped adjectives and occupations solely on the basis of speaker voice, and amplifying bias beyond that observed in text-based interaction. Thus, voice interfaces do not merely extend text models to a new modality but introduce distinct bias mechanisms tied to paralinguistic cues. Complementary survey evidence () shows that infrequent chatbot users are most hesitant to undisclosed attribute inference and most likely to disengage when such practices are revealed. To demonstrate a potential mitigation strategy, we show that pitch manipulation can systematically regulate gender-discriminatory outputs. Overall, our findings reveal a critical tension in AI development: efforts to expand accessibility through voice interfaces simultaneously create new pathways for discrimination, demanding that fairness and accessibility be addressed in tandem.
Paper Structure (25 sections, 3 equations, 11 figures, 8 tables)

This paper contains 25 sections, 3 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Gender discrimination in audio language model responses across constrained term selection (a) and open-ended profile generation (b,c).
  • Figure 2: Gender detection capability drives discrimination in audio LLMs. Models with higher detection accuracy (a) exhibit stronger stereotypical associations (b), and voice-based bias exceeds text-based bias across models (c).
  • Figure 3: User hesitancy toward voice AI systems that infer personal attributes. Non-users and infrequent chatbot users express greater concern about attribute inference (a). Ordered logistic regression reveals that frequent usage and male sex significantly reduce hesitancy (b), with usage frequency as the strongest predictor.
  • Figure 4: Voice pitch causally drives gender discrimination. Observational analysis shows pitch modulates stereotyped responses (a); experimental manipulation confirms causality (b).
  • Figure 5: Histogram of audio sample durations for all $n = 1,370$ recordings.
  • ...and 6 more figures