The Binding Effect: Analyzing How Multi-Dimensional Cues Form Gender Bias in Instruction TTS

Kuan-Yu Chen; Yi-Cheng Lin; Po-Chung Hsieh; Huang-Cheng Chou; Chih-Fan Hsu; Jeng-Lin Li; Hung-yi Lee; Jian-Jiun Ding

The Binding Effect: Analyzing How Multi-Dimensional Cues Form Gender Bias in Instruction TTS

Kuan-Yu Chen, Yi-Cheng Lin, Po-Chung Hsieh, Huang-Cheng Chou, Chih-Fan Hsu, Jeng-Lin Li, Hung-yi Lee, Jian-Jiun Ding

Abstract

Current bias evaluations in Instruction Text-to-Speech (ITTS) often rely on univariate testing, overlooking the compositional structure of social cues. In this work, we investigate gender bias by modeling prompts as combinations of Social Status, Career stereotypes, and Persona descriptors. Analyzing open-source ITTS models, we uncover systematic interaction effects where social dimensions modulate one another, creating complex bias patterns missed by univariate baselines. Crucially, our findings indicate that these biases extend beyond surface-level artifacts, demonstrating strong associations with the semantic priors of pre-trained text encoders and the skewed distributions inherent in training data. We further demonstrate that generic diversity prompting is insufficient to override these entrenched patterns, underscoring the need for compositional analysis to diagnose latent risks in generative speech.

The Binding Effect: Analyzing How Multi-Dimensional Cues Form Gender Bias in Instruction TTS

Abstract

Paper Structure (16 sections, 7 equations, 1 figure, 6 tables)

This paper contains 16 sections, 7 equations, 1 figure, 6 tables.

Introduction
Methodology
Problem Formulation
Compositional Analysis Framework
Stage 1: Univariate Sensitivity
Stage 2: Modeling Compositional Interactions
Experiments
Evaluated Models
Controlled Test Setups
Evaluation Metrics
Results and Analysis
Univariate Bias and Polarization Trends
Three Paradigms of Compositional Interaction ($\mathcal{I}$)
Origins of Bias: Data Distribution and Text Encoders
Conclusions
...and 1 more sections

Figures (1)

Figure 1: Overall Framework. Two-stage evaluation demonstrated with PromptTTS++ model. Stage 1 establishes univariate gender priors, where an isolated descriptor like nurse triggers a strong female bias ($P(\mathbf{x})=0.99$). In Stage 2, recombining tokens with attributes like high-status and reckless creates a binding effect to the original female-leaning nurse, shifting the perceived gender toward male ($P(\mathbf{x})=0.17$).

The Binding Effect: Analyzing How Multi-Dimensional Cues Form Gender Bias in Instruction TTS

Abstract

The Binding Effect: Analyzing How Multi-Dimensional Cues Form Gender Bias in Instruction TTS

Authors

Abstract

Table of Contents

Figures (1)