Enhancing Partially Spoofed Audio Localization with Boundary-aware Attention Mechanism
Jiafeng Zhong, Bin Li, Jiangyan Yi
TL;DR
This work tackles Partially Spoofed Audio Localization (PSAL) by introducing Boundary-aware Attention Mechanism (BAM), which exploits boundary information to enhance frame-level localization within a single countermeasure. BAM comprises a Boundary Enhancement module that learns boundary features from intra- and inter-frame cues and a Boundary Frame-wise Attention module that uses boundary predictions to modulate cross-frame interactions, guided by a pretrained SSL front-end. The approach achieves state-of-the-art localization on PartialSpoof (e.g., EER ≈ 3.58% and F1 ≈ 96.09% with a WavLM front-end) and is supported by extensive ablations showing each component’s contribution. The work demonstrates the practical value of boundary-aware cues for PSAL and provides code for reproducibility, with future work aimed at finer-grained boundary localization.
Abstract
The task of partially spoofed audio localization aims to accurately determine audio authenticity at a frame level. Although some works have achieved encouraging results, utilizing boundary information within a single model remains an unexplored research topic. In this work, we propose a novel method called Boundary-aware Attention Mechanism (BAM). Specifically, it consists of two core modules: Boundary Enhancement and Boundary Frame-wise Attention. The former assembles the intra-frame and inter-frame information to extract discriminative boundary features that are subsequently used for boundary position detection and authenticity decision, while the latter leverages boundary prediction results to explicitly control the feature interaction between frames, which achieves effective discrimination between real and fake frames. Experimental results on PartialSpoof database demonstrate our proposed method achieves the best performance. The code is available at https://github.com/media-sec-lab/BAM.
