FairSSD: Understanding Bias in Synthetic Speech Detectors

Amit Kumar Singh Yadav; Kratika Bhagtani; Davide Salvi; Paolo Bestagini; Edward J. Delp

FairSSD: Understanding Bias in Synthetic Speech Detectors

Amit Kumar Singh Yadav, Kratika Bhagtani, Davide Salvi, Paolo Bestagini, Edward J. Delp

TL;DR

This paper investigates fairness in synthetic speech detectors by analyzing bias across gender, age, accent, and stuttering in six detectors trained or evaluated on ASVspoof2019 and a Mozilla Common Voice–derived bias dataset. It introduces a bias-focused framework with standardized metrics including $EER$ and $FPR$ variants ($ΔEER$, $ΔFPR_1$, $ΔFPR_2$, $ΔFPR_3$) and demonstrates widespread biases, such as higher misclassification of bona fide speech from male, extreme-age, and certain accents, as well as pronounced bias against speech-impaired (stuttering) speakers. The study provides a large-scale, reproducible benchmark by assembling 28 bias evaluation sets and releasing detectors, data, and code publicly. The findings highlight that current detectors can unfairly label real speech from specific demographic groups as synthetic, underscoring the need for fairness-aware detector design and evaluation in deployment contexts. Overall, FairSSD contributes a rigorous, reproducible fair benchmarking framework and concrete evidence that bias must be addressed to build trustworthy synthetic speech detectors for real-world use.

Abstract

Methods that can generate synthetic speech which is perceptually indistinguishable from speech recorded by a human speaker, are easily available. Several incidents report misuse of synthetic speech generated from these methods to commit fraud. To counter such misuse, many methods have been proposed to detect synthetic speech. Some of these detectors are more interpretable, can generalize to detect synthetic speech in the wild and are robust to noise. However, limited work has been done on understanding bias in these detectors. In this work, we examine bias in existing synthetic speech detectors to determine if they will unfairly target a particular gender, age and accent group. We also inspect whether these detectors will have a higher misclassification rate for bona fide speech from speech-impaired speakers w.r.t fluent speakers. Extensive experiments on 6 existing synthetic speech detectors using more than 0.9 million speech signals demonstrate that most detectors are gender, age and accent biased, and future work is needed to ensure fairness. To support future research, we release our evaluation dataset, models used in our study and source code at https://gitlab.com/viper-purdue/fairssd.

FairSSD: Understanding Bias in Synthetic Speech Detectors

TL;DR

and

variants (

) and demonstrates widespread biases, such as higher misclassification of bona fide speech from male, extreme-age, and certain accents, as well as pronounced bias against speech-impaired (stuttering) speakers. The study provides a large-scale, reproducible benchmark by assembling 28 bias evaluation sets and releasing detectors, data, and code publicly. The findings highlight that current detectors can unfairly label real speech from specific demographic groups as synthetic, underscoring the need for fairness-aware detector design and evaluation in deployment contexts. Overall, FairSSD contributes a rigorous, reproducible fair benchmarking framework and concrete evidence that bias must be addressed to build trustworthy synthetic speech detectors for real-world use.

Abstract

Paper Structure (27 sections, 3 figures, 16 tables)

This paper contains 27 sections, 3 figures, 16 tables.

Introduction
Related Work
Synthetic Speech Detection
Fairness of Forensic Detectors
Proposed Study
Detectors Used in Our Study
Datasets Used in Our Study
Detection Training and Evaluation Dataset
Evaluation Datasets for Bias Study
Evaluation Metrics
Experiments and Results
Experiment 1: Detection Performance
Experiment 2: Studying Bias on Gender
Experiment 3: Studying Bias on Age
Experiment 4: Studying Bias on Accent
...and 12 more sections

Figures (3)

Figure 1: Mean False Positive Rate (FPR) of 6 synthetic speech detectors on bona fide speech from fluent and speech-impaired speakers.
Figure 2: Existing Approaches for Synthetic Speech Detection.
Figure A1: Overview of Dataset Preparation for bias study.

FairSSD: Understanding Bias in Synthetic Speech Detectors

TL;DR

Abstract

FairSSD: Understanding Bias in Synthetic Speech Detectors

Authors

TL;DR

Abstract

Table of Contents

Figures (3)