Enhancing Partially Spoofed Audio Localization with Boundary-aware Attention Mechanism

Jiafeng Zhong; Bin Li; Jiangyan Yi

Enhancing Partially Spoofed Audio Localization with Boundary-aware Attention Mechanism

Jiafeng Zhong, Bin Li, Jiangyan Yi

TL;DR

This work tackles Partially Spoofed Audio Localization (PSAL) by introducing Boundary-aware Attention Mechanism (BAM), which exploits boundary information to enhance frame-level localization within a single countermeasure. BAM comprises a Boundary Enhancement module that learns boundary features from intra- and inter-frame cues and a Boundary Frame-wise Attention module that uses boundary predictions to modulate cross-frame interactions, guided by a pretrained SSL front-end. The approach achieves state-of-the-art localization on PartialSpoof (e.g., EER ≈ 3.58% and F1 ≈ 96.09% with a WavLM front-end) and is supported by extensive ablations showing each component’s contribution. The work demonstrates the practical value of boundary-aware cues for PSAL and provides code for reproducibility, with future work aimed at finer-grained boundary localization.

Abstract

The task of partially spoofed audio localization aims to accurately determine audio authenticity at a frame level. Although some works have achieved encouraging results, utilizing boundary information within a single model remains an unexplored research topic. In this work, we propose a novel method called Boundary-aware Attention Mechanism (BAM). Specifically, it consists of two core modules: Boundary Enhancement and Boundary Frame-wise Attention. The former assembles the intra-frame and inter-frame information to extract discriminative boundary features that are subsequently used for boundary position detection and authenticity decision, while the latter leverages boundary prediction results to explicitly control the feature interaction between frames, which achieves effective discrimination between real and fake frames. Experimental results on PartialSpoof database demonstrate our proposed method achieves the best performance. The code is available at https://github.com/media-sec-lab/BAM.

Enhancing Partially Spoofed Audio Localization with Boundary-aware Attention Mechanism

TL;DR

Abstract

Paper Structure (13 sections, 8 equations, 3 figures, 4 tables)

This paper contains 13 sections, 8 equations, 3 figures, 4 tables.

Introduction
Proposed method
Pretrained self-supervised front-end
Boundary enhancement module
Boundary frame-wise attention module
Loss function
Experiments and results
Dataset and implementation details
Comparison with existing methods
Ablation study
Finer-grained resolution experiment
Conclusion
Acknowledgements

Figures (3)

Figure 1: By introducing the frame-wise attention based on boundary prediction, BAM motivates each frame to exchange information with other frames with the same authenticity label.
Figure 2: The architecture of our BAM. The BAM framework (left) comprises a Boundary Enhancement (BE) Module (center top) and a Boundary Frame-wise Attention (BFA) Module (center bottom). The dotted line arrows indicate no gradient propagation, while the solid arrows in different colors represent the gradient propagation corresponding to various loss functions. The Boundary Frame-wise Attention Block (BFAB) is identical to the Frame-wise Attention Block (FAB) but it is equipped with an additional boundary masking component (as indicated by dashed lines).
Figure 3: The illustration of the boundary masking operation. The $\otimes$ denotes element-wise multiplication.

Enhancing Partially Spoofed Audio Localization with Boundary-aware Attention Mechanism

TL;DR

Abstract

Enhancing Partially Spoofed Audio Localization with Boundary-aware Attention Mechanism

Authors

TL;DR

Abstract

Table of Contents

Figures (3)