Table of Contents
Fetching ...

Point to the Hidden: Exposing Speech Audio Splicing via Signal Pointer Nets

Denise Moussa, Germans Hirsch, Sebastian Wankerl, Christian Riess

TL;DR

This paper tackles the problem of locating splice points in speech audio under unconstrained conditions by reframing localisation as a pointer task. It introduces SigPointer, a Transformer-based pointer network that operates on continuous input signals $\mathbf{S}$ and points to splice positions using a memory $\mathbf{H}$ and per-step vectors $\mathbf{z}_{t^{*}}$, producing index distributions $\hat{\mathbf{p}}_t$ and final positions $\hat{y}_t$. The model is trained with a cosine-distance loss $d_c$ and a three-stage curriculum on data generated from anechoic sources with post-processing, achieving robust performance even under compression and noise. Empirical results show SigPointer, especially in its optimized form SigPointer*, outperforms several baselines across exact and coarse localisation tasks, with improvements of roughly $6$ to $10$ percentage points in key metrics on challenging datasets, and maintains strong performance under out-of-distribution processing chains. The work demonstrates that pointer mechanisms can effectively localise audio splices in continuous signals with relatively small models, offering a practical tool for forensic analysis and digital integrity verification.

Abstract

Verifying the integrity of voice recording evidence for criminal investigations is an integral part of an audio forensic analyst's work. Here, one focus is on detecting deletion or insertion operations, so called audio splicing. While this is a rather easy approach to alter spoken statements, careful editing can yield quite convincing results. For difficult cases or big amounts of data, automated tools can support in detecting potential editing locations. To this end, several analytical and deep learning methods have been proposed by now. Still, few address unconstrained splicing scenarios as expected in practice. With SigPointer, we propose a pointer network framework for continuous input that uncovers splice locations naturally and more efficiently than existing works. Extensive experiments on forensically challenging data like strongly compressed and noisy signals quantify the benefit of the pointer mechanism with performance increases between about 6 to 10 percentage points.

Point to the Hidden: Exposing Speech Audio Splicing via Signal Pointer Nets

TL;DR

This paper tackles the problem of locating splice points in speech audio under unconstrained conditions by reframing localisation as a pointer task. It introduces SigPointer, a Transformer-based pointer network that operates on continuous input signals and points to splice positions using a memory and per-step vectors , producing index distributions and final positions . The model is trained with a cosine-distance loss and a three-stage curriculum on data generated from anechoic sources with post-processing, achieving robust performance even under compression and noise. Empirical results show SigPointer, especially in its optimized form SigPointer*, outperforms several baselines across exact and coarse localisation tasks, with improvements of roughly to percentage points in key metrics on challenging datasets, and maintains strong performance under out-of-distribution processing chains. The work demonstrates that pointer mechanisms can effectively localise audio splices in continuous signals with relatively small models, offering a practical tool for forensic analysis and digital integrity verification.

Abstract

Verifying the integrity of voice recording evidence for criminal investigations is an integral part of an audio forensic analyst's work. Here, one focus is on detecting deletion or insertion operations, so called audio splicing. While this is a rather easy approach to alter spoken statements, careful editing can yield quite convincing results. For difficult cases or big amounts of data, automated tools can support in detecting potential editing locations. To this end, several analytical and deep learning methods have been proposed by now. Still, few address unconstrained splicing scenarios as expected in practice. With SigPointer, we propose a pointer network framework for continuous input that uncovers splice locations naturally and more efficiently than existing works. Extensive experiments on forensically challenging data like strongly compressed and noisy signals quantify the benefit of the pointer mechanism with performance increases between about 6 to 10 percentage points.
Paper Structure (15 sections, 2 figures, 1 table)

This paper contains 15 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: SigPointer model for locating splices in audio signals
  • Figure 2: Jaccard index J (mean of 5 training runs) for $n \in [0,5]$ splices (Figure \ref{['subfig:tts-per-splice-a']}-\ref{['subfig:tts-per-splice-d']}) and robustness towards out-of-distribution multi-compression and real noise post-processing (Fig. \ref{['subfig:compr_bin1']}-\ref{['subfig:noise_bin4']}). The pointer framework (red tones) is clearly superior in all tests.