Table of Contents
Fetching ...

Enhancing Efficiency and Performance in Deepfake Audio Detection through Neuron-level Dropin & Neuroplasticity Mechanisms

Yupei Li, Shuaijie Shao, Manuel Milling, Björn Schuller

Abstract

Current audio deepfake detection has achieved remarkable performance using diverse deep learning architectures such as ResNet, and has seen further improvements with the introduction of large models (LMs) like Wav2Vec. The success of large language models (LLMs) further demonstrates the benefits of scaling model parameters, but also highlights one bottleneck where performance gains are constrained by parameter counts. Simply stacking additional layers, as done in current LLMs, is computationally expensive and requires full retraining. Furthermore, existing low-rank adaptation methods are primarily applied to attention-based architectures, which limits their scope. Inspired by the neuronal plasticity observed in mammalian brains, we propose novel algorithms, dropin and further plasticity, that dynamically adjust the number of neurons in certain layers to flexibly modulate model parameters. We evaluate these algorithms on multiple architectures, including ResNet, Gated Recurrent Neural Networks, and Wav2Vec. Experimental results using the widely recognised ASVSpoof2019 LA, PA, and FakeorReal dataset demonstrate consistent improvements in computational efficiency with the dropin approach and a maximum of around 39% and 66% relative reduction in Equal Error Rate with the dropin and plasticity approach among these dataset, respectively. The code and supplementary material are available at Github link.

Enhancing Efficiency and Performance in Deepfake Audio Detection through Neuron-level Dropin & Neuroplasticity Mechanisms

Abstract

Current audio deepfake detection has achieved remarkable performance using diverse deep learning architectures such as ResNet, and has seen further improvements with the introduction of large models (LMs) like Wav2Vec. The success of large language models (LLMs) further demonstrates the benefits of scaling model parameters, but also highlights one bottleneck where performance gains are constrained by parameter counts. Simply stacking additional layers, as done in current LLMs, is computationally expensive and requires full retraining. Furthermore, existing low-rank adaptation methods are primarily applied to attention-based architectures, which limits their scope. Inspired by the neuronal plasticity observed in mammalian brains, we propose novel algorithms, dropin and further plasticity, that dynamically adjust the number of neurons in certain layers to flexibly modulate model parameters. We evaluate these algorithms on multiple architectures, including ResNet, Gated Recurrent Neural Networks, and Wav2Vec. Experimental results using the widely recognised ASVSpoof2019 LA, PA, and FakeorReal dataset demonstrate consistent improvements in computational efficiency with the dropin approach and a maximum of around 39% and 66% relative reduction in Equal Error Rate with the dropin and plasticity approach among these dataset, respectively. The code and supplementary material are available at Github link.

Paper Structure

This paper contains 8 sections, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1: High-level dropin process: During the dropin phase, we load the original pretrained model weights and keep them frozen. Additional neurons are then introduced into randomly selected layers, and only the connection weights associated with these newly added neurons are trained.
  • Figure 2: Convolutional layers, recurrent units, and attention encoders dropin process. These represent the specific dropin techniques adapted to different model architectures. From a mathematical perspective, this involves increasing the kernel size for CNNs, expanding the gate weight dimensions for GRUs, and enlarging the query, key, and value weight dimensions for attention mechanisms.
  • Figure 3: Plasticity process: The pipeline is divided into three stages. The first stage is identical to conventional training, where the model is optimised using the standard objective function. In the second stage, new neurons are dropped in, a process analogous to neurogenesis. The whole model is then retrained. After the newly introduced information has been assimilated and distributed across existing neurons. The third stage emulates neuroapoptosis by pruning the added neurons, followed by a final retraining.
  • Figure 4: GradCAM results on Resnet for one case from ASVSpoof 2019 LA.
  • Figure 5: EER results for different dropin layers. "3" represents that we dropped in new neurons on the 3rd encoding layer in Wav2vec 2.0 small on ASVSpoof2019 LA.