Table of Contents
Fetching ...

Beyond Silence: Bias Analysis through Loss and Asymmetric Approach in Audio Anti-Spoofing

Hye-jin Shim, Md Sahidullah, Jee-weon Jung, Shinji Watanabe, Tomi Kinnunen

TL;DR

This work challenges the prevailing attacker-focused narrative in audio anti-spoofing by examining training dynamics through loss-based analysis and asymmetric interventions. It shows a consistent bias toward the spoof class in per-class losses and demonstrates that certain loss functions can modulate this bias, with Generalized Cross Entropy offering modest bonafide gains. The asymmetric intervention framework reveals that many interventions improve spoof-detection performance at the expense of bonafide robustness, highlighting a fragility in bonafide modeling. The findings argue for a shift toward robust bonafide modeling and balanced evaluation, with implications for loss design, data curation, and evaluation in future anti-spoofing systems.

Abstract

Current trends in audio anti-spoofing detection research strive to improve models' ability to generalize across unseen attacks by learning to identify a variety of spoofing artifacts. This emphasis has primarily focused on the spoof class. Recently, several studies have noted that the distribution of silence differs between the two classes, which can serve as a shortcut. In this paper, we extend class-wise interpretations beyond silence. We employ loss analysis and asymmetric methodologies to move away from traditional attack-focused and result-oriented evaluations towards a deeper examination of model behaviors. Our investigations highlight the significant differences in training dynamics between the two classes, emphasizing the need for future research to focus on robust modeling of the bonafide class.

Beyond Silence: Bias Analysis through Loss and Asymmetric Approach in Audio Anti-Spoofing

TL;DR

This work challenges the prevailing attacker-focused narrative in audio anti-spoofing by examining training dynamics through loss-based analysis and asymmetric interventions. It shows a consistent bias toward the spoof class in per-class losses and demonstrates that certain loss functions can modulate this bias, with Generalized Cross Entropy offering modest bonafide gains. The asymmetric intervention framework reveals that many interventions improve spoof-detection performance at the expense of bonafide robustness, highlighting a fragility in bonafide modeling. The findings argue for a shift toward robust bonafide modeling and balanced evaluation, with implications for loss design, data curation, and evaluation in future anti-spoofing systems.

Abstract

Current trends in audio anti-spoofing detection research strive to improve models' ability to generalize across unseen attacks by learning to identify a variety of spoofing artifacts. This emphasis has primarily focused on the spoof class. Recently, several studies have noted that the distribution of silence differs between the two classes, which can serve as a shortcut. In this paper, we extend class-wise interpretations beyond silence. We employ loss analysis and asymmetric methodologies to move away from traditional attack-focused and result-oriented evaluations towards a deeper examination of model behaviors. Our investigations highlight the significant differences in training dynamics between the two classes, emphasizing the need for future research to focus on robust modeling of the bonafide class.
Paper Structure (14 sections, 1 figure, 5 tables)

This paper contains 14 sections, 1 figure, 5 tables.

Figures (1)

  • Figure 1: Comparison of training loss. The left and right figures illustrate bonafide and spoof classes. x-axis and y-axis indicate training epochs and loss magnitude. Regardless of the implementation of data augmentation, the two class losses differ on a large scale.