Point to the Hidden: Exposing Speech Audio Splicing via Signal Pointer Nets
Denise Moussa, Germans Hirsch, Sebastian Wankerl, Christian Riess
TL;DR
This paper tackles the problem of locating splice points in speech audio under unconstrained conditions by reframing localisation as a pointer task. It introduces SigPointer, a Transformer-based pointer network that operates on continuous input signals $\mathbf{S}$ and points to splice positions using a memory $\mathbf{H}$ and per-step vectors $\mathbf{z}_{t^{*}}$, producing index distributions $\hat{\mathbf{p}}_t$ and final positions $\hat{y}_t$. The model is trained with a cosine-distance loss $d_c$ and a three-stage curriculum on data generated from anechoic sources with post-processing, achieving robust performance even under compression and noise. Empirical results show SigPointer, especially in its optimized form SigPointer*, outperforms several baselines across exact and coarse localisation tasks, with improvements of roughly $6$ to $10$ percentage points in key metrics on challenging datasets, and maintains strong performance under out-of-distribution processing chains. The work demonstrates that pointer mechanisms can effectively localise audio splices in continuous signals with relatively small models, offering a practical tool for forensic analysis and digital integrity verification.
Abstract
Verifying the integrity of voice recording evidence for criminal investigations is an integral part of an audio forensic analyst's work. Here, one focus is on detecting deletion or insertion operations, so called audio splicing. While this is a rather easy approach to alter spoken statements, careful editing can yield quite convincing results. For difficult cases or big amounts of data, automated tools can support in detecting potential editing locations. To this end, several analytical and deep learning methods have been proposed by now. Still, few address unconstrained splicing scenarios as expected in practice. With SigPointer, we propose a pointer network framework for continuous input that uncovers splice locations naturally and more efficiently than existing works. Extensive experiments on forensically challenging data like strongly compressed and noisy signals quantify the benefit of the pointer mechanism with performance increases between about 6 to 10 percentage points.
