Gender-ambiguous voice generation through feminine speaking style transfer in male voices

Maria Koutsogiannaki; Shafel Mc Dowall; Ioannis Agiomyrgiannakis

Gender-ambiguous voice generation through feminine speaking style transfer in male voices

Maria Koutsogiannaki, Shafel Mc Dowall, Ioannis Agiomyrgiannakis

TL;DR

This work addresses the need for gender-ambiguous synthetic voices by incorporating feminine speaking style into masculine timbre with pitch shifting toward the gender boundary. Using a female Azure TTS voice morphed onto male targets and shifted by $3$ and $4$ semitones toward the boundary at $170$ Hz, the authors generate candidate voices (mJames, mTaylor) and compare them to pitch-only baselines (pJames, pTaylor). A bias-resistant evaluation framework with three listening tests and clearly defined metrics demonstrates that style transfer yields greater gender-ambiguity than pitch modification alone, while maintaining good audio quality (with modest degradation). The study provides the first explicit emphasis on speaking style in gender-ambiguous voice generation, defines ambiguity criteria, and outlines a responsible AI evaluation approach suitable for post-processing TTS systems, laying groundwork for more inclusive voice technologies across diverse gender identities.

Abstract

Recently, and under the umbrella of Responsible AI, efforts have been made to develop gender-ambiguous synthetic speech to represent with a single voice all individuals in the gender spectrum. However, research efforts have completely overlooked the speaking style despite differences found among binary and non-binary populations. In this work, we synthesise gender-ambiguous speech by combining the timbre of a male speaker with the manner of speech of a female speaker using voice morphing and pitch shifting towards the male-female boundary. Subjective evaluations indicate that the ambiguity of the morphed samples that convey the female speech style is higher than those that undergo plain pitch transformations suggesting that the speaking style can be a contributing factor in creating gender-ambiguous speech. To our knowledge, this is the first study that explicitly uses the transfer of the speaking style to create gender-ambiguous voices.

Gender-ambiguous voice generation through feminine speaking style transfer in male voices

TL;DR

and

semitones toward the boundary at

Hz, the authors generate candidate voices (mJames, mTaylor) and compare them to pitch-only baselines (pJames, pTaylor). A bias-resistant evaluation framework with three listening tests and clearly defined metrics demonstrates that style transfer yields greater gender-ambiguity than pitch modification alone, while maintaining good audio quality (with modest degradation). The study provides the first explicit emphasis on speaking style in gender-ambiguous voice generation, defines ambiguity criteria, and outlines a responsible AI evaluation approach suitable for post-processing TTS systems, laying groundwork for more inclusive voice technologies across diverse gender identities.

Abstract

Paper Structure (11 sections, 1 figure, 3 tables)

This paper contains 11 sections, 1 figure, 3 tables.

Introduction
Related work
Speech differences among binary and non-binary
Gender-ambiguous voice synthesis
Motivation
Methodology
Gender-ambiguous voice generation
Evaluation framework
Evaluation metrics
Evaluation
Conclusion

Figures (1)

Figure 1: Subjective evaluation scores one each metric Q1: gender classification, Q2: Confidence, Q3: Surprise, Q4: Femininity/masculinity, Q5: Quality as described in Table \ref{['table:metrics']} of all 3 listening tests A, B and C. The proposed gender-ambiguous voices are mTaylor and mJames which resulted from the voice morphing of Bella (source speaker) and Taylor and James (target speakers) respectively. pTaylor and pJames derived by pitch-shifting 3 and 4 semitones the voices of Taylor and James respectively. Ryan, Alfie, Sonia and Abbi are binary voices used to balance the listening test and for statistical analysis.

Gender-ambiguous voice generation through feminine speaking style transfer in male voices

TL;DR

Abstract

Gender-ambiguous voice generation through feminine speaking style transfer in male voices

Authors

TL;DR

Abstract

Table of Contents

Figures (1)