Table of Contents
Fetching ...

Enhancing Partially Spoofed Audio Localization with Boundary-aware Attention Mechanism

Jiafeng Zhong, Bin Li, Jiangyan Yi

TL;DR

This work tackles Partially Spoofed Audio Localization (PSAL) by introducing Boundary-aware Attention Mechanism (BAM), which exploits boundary information to enhance frame-level localization within a single countermeasure. BAM comprises a Boundary Enhancement module that learns boundary features from intra- and inter-frame cues and a Boundary Frame-wise Attention module that uses boundary predictions to modulate cross-frame interactions, guided by a pretrained SSL front-end. The approach achieves state-of-the-art localization on PartialSpoof (e.g., EER ≈ 3.58% and F1 ≈ 96.09% with a WavLM front-end) and is supported by extensive ablations showing each component’s contribution. The work demonstrates the practical value of boundary-aware cues for PSAL and provides code for reproducibility, with future work aimed at finer-grained boundary localization.

Abstract

The task of partially spoofed audio localization aims to accurately determine audio authenticity at a frame level. Although some works have achieved encouraging results, utilizing boundary information within a single model remains an unexplored research topic. In this work, we propose a novel method called Boundary-aware Attention Mechanism (BAM). Specifically, it consists of two core modules: Boundary Enhancement and Boundary Frame-wise Attention. The former assembles the intra-frame and inter-frame information to extract discriminative boundary features that are subsequently used for boundary position detection and authenticity decision, while the latter leverages boundary prediction results to explicitly control the feature interaction between frames, which achieves effective discrimination between real and fake frames. Experimental results on PartialSpoof database demonstrate our proposed method achieves the best performance. The code is available at https://github.com/media-sec-lab/BAM.

Enhancing Partially Spoofed Audio Localization with Boundary-aware Attention Mechanism

TL;DR

This work tackles Partially Spoofed Audio Localization (PSAL) by introducing Boundary-aware Attention Mechanism (BAM), which exploits boundary information to enhance frame-level localization within a single countermeasure. BAM comprises a Boundary Enhancement module that learns boundary features from intra- and inter-frame cues and a Boundary Frame-wise Attention module that uses boundary predictions to modulate cross-frame interactions, guided by a pretrained SSL front-end. The approach achieves state-of-the-art localization on PartialSpoof (e.g., EER ≈ 3.58% and F1 ≈ 96.09% with a WavLM front-end) and is supported by extensive ablations showing each component’s contribution. The work demonstrates the practical value of boundary-aware cues for PSAL and provides code for reproducibility, with future work aimed at finer-grained boundary localization.

Abstract

The task of partially spoofed audio localization aims to accurately determine audio authenticity at a frame level. Although some works have achieved encouraging results, utilizing boundary information within a single model remains an unexplored research topic. In this work, we propose a novel method called Boundary-aware Attention Mechanism (BAM). Specifically, it consists of two core modules: Boundary Enhancement and Boundary Frame-wise Attention. The former assembles the intra-frame and inter-frame information to extract discriminative boundary features that are subsequently used for boundary position detection and authenticity decision, while the latter leverages boundary prediction results to explicitly control the feature interaction between frames, which achieves effective discrimination between real and fake frames. Experimental results on PartialSpoof database demonstrate our proposed method achieves the best performance. The code is available at https://github.com/media-sec-lab/BAM.
Paper Structure (13 sections, 8 equations, 3 figures, 4 tables)

This paper contains 13 sections, 8 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: By introducing the frame-wise attention based on boundary prediction, BAM motivates each frame to exchange information with other frames with the same authenticity label.
  • Figure 2: The architecture of our BAM. The BAM framework (left) comprises a Boundary Enhancement (BE) Module (center top) and a Boundary Frame-wise Attention (BFA) Module (center bottom). The dotted line arrows indicate no gradient propagation, while the solid arrows in different colors represent the gradient propagation corresponding to various loss functions. The Boundary Frame-wise Attention Block (BFAB) is identical to the Frame-wise Attention Block (FAB) but it is equipped with an additional boundary masking component (as indicated by dashed lines).
  • Figure 3: The illustration of the boundary masking operation. The $\otimes$ denotes element-wise multiplication.