Sparse deepfake detection promotes better disentanglement
Antoine Teissier, Marie Tahon, Nicolas Dugué, Aghilas Sini
TL;DR
Deepfake detection must balance accuracy, efficiency, and interpretability. The paper introduces sparse latent representations via a TopK activation on the final layer of AASIST to promote disentanglement, treating attacks as factors and evaluating with mutual information–based metrics (completeness, modularity). Empirical results on ASVspoof5 show that a highly sparse latent space (≈95% zeros) can achieve a best test EER of $23.36\%$ with $D=320$, $k=20$, and that MI-based analyses reveal clearer, more attack-specific latent structure, with some attacks encoded directly in latent space. This approach demonstrates that sparsity can yield both predictive gains and interpretable latent factors, paving the way for explainable deepfake detectors specialized to spoofing attributes.
Abstract
Due to the rapid progress of speech synthesis, deepfake detection has become a major concern in the speech processing community. Because it is a critical task, systems must not only be efficient and robust, but also provide interpretable explanations. Among the different approaches for explainability, we focus on the interpretation of latent representations. In such paper, we focus on the last layer of embeddings of AASIST, a deepfake detection architecture. We use a TopK activation inspired by SAEs on this layer to obtain sparse representations which are used in the decision process. We demonstrate that sparse deepfake detection can improve detection performance, with an EER of 23.36% on ASVSpoof5 test set, with 95% of sparsity. We then show that these representations provide better disentanglement, using completeness and modularity metrics based on mutual information. Notably, some attacks are directly encoded in the latent space.
