Table of Contents
Fetching ...

FADEL: Uncertainty-aware Fake Audio Detection with Evidential Deep Learning

Ju Yeon Kang, Ji Won Yoon, Semin Kim, Min Hyun Han, Nam Soo Kim

TL;DR

FADEL addresses the critical challenge of detecting fake audio under unseen, out-of-distribution spoofing by replacing conventional softmax outputs with an evidential learning framework that models class probabilities via a Dirichlet distribution. By deriving class means and explicit uncertainty from Dirichlet parameters, FADEL provides more reliable predictions in OOD scenarios and mitigates overconfidence. The approach is evaluated on ASVspoof2019 LA and cross-dataset ASVspoof2021 LA benchmarks, showing significant improvements in EER and min-tDCF when integrated with backbone models such as Res-TSSDNet and AASIST, and a demonstrated alignment between predicted uncertainty and error rates. These results suggest that uncertainty-aware fake audio detection offers a practical path to more robust anti-spoofing systems in real-world deployments.

Abstract

Recently, fake audio detection has gained significant attention, as advancements in speech synthesis and voice conversion have increased the vulnerability of automatic speaker verification (ASV) systems to spoofing attacks. A key challenge in this task is generalizing models to detect unseen, out-of-distribution (OOD) attacks. Although existing approaches have shown promising results, they inherently suffer from overconfidence issues due to the usage of softmax for classification, which can produce unreliable predictions when encountering unpredictable spoofing attempts. To deal with this limitation, we propose a novel framework called fake audio detection with evidential learning (FADEL). By modeling class probabilities with a Dirichlet distribution, FADEL incorporates model uncertainty into its predictions, thereby leading to more robust performance in OOD scenarios. Experimental results on the ASVspoof2019 Logical Access (LA) and ASVspoof2021 LA datasets indicate that the proposed method significantly improves the performance of baseline models. Furthermore, we demonstrate the validity of uncertainty estimation by analyzing a strong correlation between average uncertainty and equal error rate (EER) across different spoofing algorithms.

FADEL: Uncertainty-aware Fake Audio Detection with Evidential Deep Learning

TL;DR

FADEL addresses the critical challenge of detecting fake audio under unseen, out-of-distribution spoofing by replacing conventional softmax outputs with an evidential learning framework that models class probabilities via a Dirichlet distribution. By deriving class means and explicit uncertainty from Dirichlet parameters, FADEL provides more reliable predictions in OOD scenarios and mitigates overconfidence. The approach is evaluated on ASVspoof2019 LA and cross-dataset ASVspoof2021 LA benchmarks, showing significant improvements in EER and min-tDCF when integrated with backbone models such as Res-TSSDNet and AASIST, and a demonstrated alignment between predicted uncertainty and error rates. These results suggest that uncertainty-aware fake audio detection offers a practical path to more robust anti-spoofing systems in real-world deployments.

Abstract

Recently, fake audio detection has gained significant attention, as advancements in speech synthesis and voice conversion have increased the vulnerability of automatic speaker verification (ASV) systems to spoofing attacks. A key challenge in this task is generalizing models to detect unseen, out-of-distribution (OOD) attacks. Although existing approaches have shown promising results, they inherently suffer from overconfidence issues due to the usage of softmax for classification, which can produce unreliable predictions when encountering unpredictable spoofing attempts. To deal with this limitation, we propose a novel framework called fake audio detection with evidential learning (FADEL). By modeling class probabilities with a Dirichlet distribution, FADEL incorporates model uncertainty into its predictions, thereby leading to more robust performance in OOD scenarios. Experimental results on the ASVspoof2019 Logical Access (LA) and ASVspoof2021 LA datasets indicate that the proposed method significantly improves the performance of baseline models. Furthermore, we demonstrate the validity of uncertainty estimation by analyzing a strong correlation between average uncertainty and equal error rate (EER) across different spoofing algorithms.

Paper Structure

This paper contains 15 sections, 6 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: An Overview of FADEL.
  • Figure 2: Histograms of probabilities for the bonafide class, $p_{bonafide}$, predicted by AASIST and AASIST-FADEL: Insets in (a) and (b) present a detailed view of spoof predictions between 0.1 and 1. (c) provides a comparison of spoof predictions between AASIST and AASIST-FADEL in a probability range of 0.1 to 1.
  • Figure 3: Scatter plot showing the relationship between average uncertainty and EER across spoofing algorithms in the ASVspoof2019 LA evaluation set. Algorithms with lower correlation are marked with an 'x'.