Table of Contents
Fetching ...

FairSSD: Understanding Bias in Synthetic Speech Detectors

Amit Kumar Singh Yadav, Kratika Bhagtani, Davide Salvi, Paolo Bestagini, Edward J. Delp

TL;DR

This paper investigates fairness in synthetic speech detectors by analyzing bias across gender, age, accent, and stuttering in six detectors trained or evaluated on ASVspoof2019 and a Mozilla Common Voice–derived bias dataset. It introduces a bias-focused framework with standardized metrics including $EER$ and $FPR$ variants ($ΔEER$, $ΔFPR_1$, $ΔFPR_2$, $ΔFPR_3$) and demonstrates widespread biases, such as higher misclassification of bona fide speech from male, extreme-age, and certain accents, as well as pronounced bias against speech-impaired (stuttering) speakers. The study provides a large-scale, reproducible benchmark by assembling 28 bias evaluation sets and releasing detectors, data, and code publicly. The findings highlight that current detectors can unfairly label real speech from specific demographic groups as synthetic, underscoring the need for fairness-aware detector design and evaluation in deployment contexts. Overall, FairSSD contributes a rigorous, reproducible fair benchmarking framework and concrete evidence that bias must be addressed to build trustworthy synthetic speech detectors for real-world use.

Abstract

Methods that can generate synthetic speech which is perceptually indistinguishable from speech recorded by a human speaker, are easily available. Several incidents report misuse of synthetic speech generated from these methods to commit fraud. To counter such misuse, many methods have been proposed to detect synthetic speech. Some of these detectors are more interpretable, can generalize to detect synthetic speech in the wild and are robust to noise. However, limited work has been done on understanding bias in these detectors. In this work, we examine bias in existing synthetic speech detectors to determine if they will unfairly target a particular gender, age and accent group. We also inspect whether these detectors will have a higher misclassification rate for bona fide speech from speech-impaired speakers w.r.t fluent speakers. Extensive experiments on 6 existing synthetic speech detectors using more than 0.9 million speech signals demonstrate that most detectors are gender, age and accent biased, and future work is needed to ensure fairness. To support future research, we release our evaluation dataset, models used in our study and source code at https://gitlab.com/viper-purdue/fairssd.

FairSSD: Understanding Bias in Synthetic Speech Detectors

TL;DR

This paper investigates fairness in synthetic speech detectors by analyzing bias across gender, age, accent, and stuttering in six detectors trained or evaluated on ASVspoof2019 and a Mozilla Common Voice–derived bias dataset. It introduces a bias-focused framework with standardized metrics including and variants (, , , ) and demonstrates widespread biases, such as higher misclassification of bona fide speech from male, extreme-age, and certain accents, as well as pronounced bias against speech-impaired (stuttering) speakers. The study provides a large-scale, reproducible benchmark by assembling 28 bias evaluation sets and releasing detectors, data, and code publicly. The findings highlight that current detectors can unfairly label real speech from specific demographic groups as synthetic, underscoring the need for fairness-aware detector design and evaluation in deployment contexts. Overall, FairSSD contributes a rigorous, reproducible fair benchmarking framework and concrete evidence that bias must be addressed to build trustworthy synthetic speech detectors for real-world use.

Abstract

Methods that can generate synthetic speech which is perceptually indistinguishable from speech recorded by a human speaker, are easily available. Several incidents report misuse of synthetic speech generated from these methods to commit fraud. To counter such misuse, many methods have been proposed to detect synthetic speech. Some of these detectors are more interpretable, can generalize to detect synthetic speech in the wild and are robust to noise. However, limited work has been done on understanding bias in these detectors. In this work, we examine bias in existing synthetic speech detectors to determine if they will unfairly target a particular gender, age and accent group. We also inspect whether these detectors will have a higher misclassification rate for bona fide speech from speech-impaired speakers w.r.t fluent speakers. Extensive experiments on 6 existing synthetic speech detectors using more than 0.9 million speech signals demonstrate that most detectors are gender, age and accent biased, and future work is needed to ensure fairness. To support future research, we release our evaluation dataset, models used in our study and source code at https://gitlab.com/viper-purdue/fairssd.
Paper Structure (27 sections, 3 figures, 16 tables)

This paper contains 27 sections, 3 figures, 16 tables.

Figures (3)

  • Figure 1: Mean False Positive Rate (FPR) of 6 synthetic speech detectors on bona fide speech from fluent and speech-impaired speakers.
  • Figure 2: Existing Approaches for Synthetic Speech Detection.
  • Figure A1: Overview of Dataset Preparation for bias study.