Table of Contents
Fetching ...

Quantum-Trained Convolutional Neural Network for Deepfake Audio Detection

Chu-Hsuan Abraham Lin, Chen-Yu Liu, Samuel Yen-Chi Chen, Kuan-Cheng Chen

TL;DR

A Quantum-Trained Convolutional Neural Network (QT-CNN) framework designed to enhance the detection of deepfake audio, leveraging the computational power of quantum machine learning (QML).

Abstract

The rise of deepfake technologies has posed significant challenges to privacy, security, and information integrity, particularly in audio and multimedia content. This paper introduces a Quantum-Trained Convolutional Neural Network (QT-CNN) framework designed to enhance the detection of deepfake audio, leveraging the computational power of quantum machine learning (QML). The QT-CNN employs a hybrid quantum-classical approach, integrating Quantum Neural Networks (QNNs) with classical neural architectures to optimize training efficiency while reducing the number of trainable parameters. Our method incorporates a novel quantum-to-classical parameter mapping that effectively utilizes quantum states to enhance the expressive power of the model, achieving up to 70% parameter reduction compared to classical models without compromising accuracy. Data pre-processing involved extracting essential audio features, label encoding, feature scaling, and constructing sequential datasets for robust model evaluation. Experimental results demonstrate that the QT-CNN achieves comparable performance to traditional CNNs, maintaining high accuracy during training and testing phases across varying configurations of QNN blocks. The QT framework's ability to reduce computational overhead while maintaining performance underscores its potential for real-world applications in deepfake detection and other resource-constrained scenarios. This work highlights the practical benefits of integrating quantum computing into artificial intelligence, offering a scalable and efficient approach to advancing deepfake detection technologies.

Quantum-Trained Convolutional Neural Network for Deepfake Audio Detection

TL;DR

A Quantum-Trained Convolutional Neural Network (QT-CNN) framework designed to enhance the detection of deepfake audio, leveraging the computational power of quantum machine learning (QML).

Abstract

The rise of deepfake technologies has posed significant challenges to privacy, security, and information integrity, particularly in audio and multimedia content. This paper introduces a Quantum-Trained Convolutional Neural Network (QT-CNN) framework designed to enhance the detection of deepfake audio, leveraging the computational power of quantum machine learning (QML). The QT-CNN employs a hybrid quantum-classical approach, integrating Quantum Neural Networks (QNNs) with classical neural architectures to optimize training efficiency while reducing the number of trainable parameters. Our method incorporates a novel quantum-to-classical parameter mapping that effectively utilizes quantum states to enhance the expressive power of the model, achieving up to 70% parameter reduction compared to classical models without compromising accuracy. Data pre-processing involved extracting essential audio features, label encoding, feature scaling, and constructing sequential datasets for robust model evaluation. Experimental results demonstrate that the QT-CNN achieves comparable performance to traditional CNNs, maintaining high accuracy during training and testing phases across varying configurations of QNN blocks. The QT framework's ability to reduce computational overhead while maintaining performance underscores its potential for real-world applications in deepfake detection and other resource-constrained scenarios. This work highlights the practical benefits of integrating quantum computing into artificial intelligence, offering a scalable and efficient approach to advancing deepfake detection technologies.

Paper Structure

This paper contains 11 sections, 2 equations, 4 figures.

Figures (4)

  • Figure 1: Conceptual diagram of the Quantum-Train framework, where the blue line represents the quantum-assisted training of the detection model, and the green line indicates inference on classical hardware such as GPUs for real-time classification of audio as real or fake.
  • Figure 2: Schematic of the QT framework illustrating the flow from the N-qubit QNN with parameterized Ry gates, through the mapping model, to the final CNN. The QT framework significantly reduces the number of trainable parameters by leveraging quantum computation, with gradients evaluated and parameters updated iteratively to optimize the CNN's performance.
  • Figure 3: Comparison of real (left: a, b, c) and deepfaked (right: d, e, f) audio representations. Panels (a) and (d) show time-domain waveforms, (b) and (e) display spectrograms, and (c) and (f) present MFCCs, highlighting key audio features.
  • Figure 4: Performance comparison of Quantum-Train (QT) CNN and classical CNN models for deepfake speech recognition, demonstrating accuracy and parameter efficiency across varying numbers of QNN blocks. The QT approach achieves comparable accuracy to classical training methods while significantly reducing the parameter count, as indicated by the increasing parameter ratio curve.