Table of Contents
Fetching ...

Deepfake Detection with Optimized Hybrid Model: EAR Biometric Descriptor via Improved RCNN

Ruchika Sharma, Rudresh Dwivedi

TL;DR

This work tackles the escalating challenge of deepfake detection by leveraging ear biometrics as a stable descriptor extracted via an improved RCNN. It combines a hybrid detector comprising Bi-GRU and Deep Belief Network, with weights optimized by the Self-Upgraded Jellyfish Optimization (SU-JFO) method and an enhanced score-level fusion, to distinguish real from fake content. Across WLDR, DeepfakeTIMIT, and Celeb-DF datasets, the proposed approach consistently surpasses traditional CNN-, LSTM-, and hybrid-based baselines under compression, noise, rotation, pose, and illumination, achieving higher accuracy, precision, and MCC. The method advances robustness and reliability of deepfake detection, offering practical potential for media integrity and security applications, and suggests future work integrating emotion-related cues with ear dynamics for further improvements.

Abstract

Deepfake is a widely used technology employed in recent years to create pernicious content such as fake news, movies, and rumors by altering and substituting facial information from various sources. Given the ongoing evolution of deepfakes investigation of continuous identification and prevention is crucial. Due to recent technological advancements in AI (Artificial Intelligence) distinguishing deepfakes and artificially altered images has become challenging. This approach introduces the robust detection of subtle ear movements and shape changes to generate ear descriptors. Further, we also propose a novel optimized hybrid deepfake detection model that considers the ear biometric descriptors via enhanced RCNN (Region-Based Convolutional Neural Network). Initially, the input video is converted into frames and preprocessed through resizing, normalization, grayscale conversion, and filtering processes followed by face detection using the Viola-Jones technique. Next, a hybrid model comprising DBN (Deep Belief Network) and Bi-GRU (Bidirectional Gated Recurrent Unit) is utilized for deepfake detection based on ear descriptors. The output from the detection phase is determined through improved score-level fusion. To enhance the performance, the weights of both detection models are optimally tuned using the SU-JFO (Self-Upgraded Jellyfish Optimization method). Experimentation is conducted based on four scenarios: compression, noise, rotation, pose, and illumination on three different datasets. The performance results affirm that our proposed method outperforms traditional models such as CNN (Convolution Neural Network), SqueezeNet, LeNet, LinkNet, LSTM (Long Short-Term Memory), DFP (Deepfake Predictor) [1], and ResNext+CNN+LSTM [2] in terms of various performance metrics viz. accuracy, specificity, and precision.

Deepfake Detection with Optimized Hybrid Model: EAR Biometric Descriptor via Improved RCNN

TL;DR

This work tackles the escalating challenge of deepfake detection by leveraging ear biometrics as a stable descriptor extracted via an improved RCNN. It combines a hybrid detector comprising Bi-GRU and Deep Belief Network, with weights optimized by the Self-Upgraded Jellyfish Optimization (SU-JFO) method and an enhanced score-level fusion, to distinguish real from fake content. Across WLDR, DeepfakeTIMIT, and Celeb-DF datasets, the proposed approach consistently surpasses traditional CNN-, LSTM-, and hybrid-based baselines under compression, noise, rotation, pose, and illumination, achieving higher accuracy, precision, and MCC. The method advances robustness and reliability of deepfake detection, offering practical potential for media integrity and security applications, and suggests future work integrating emotion-related cues with ear dynamics for further improvements.

Abstract

Deepfake is a widely used technology employed in recent years to create pernicious content such as fake news, movies, and rumors by altering and substituting facial information from various sources. Given the ongoing evolution of deepfakes investigation of continuous identification and prevention is crucial. Due to recent technological advancements in AI (Artificial Intelligence) distinguishing deepfakes and artificially altered images has become challenging. This approach introduces the robust detection of subtle ear movements and shape changes to generate ear descriptors. Further, we also propose a novel optimized hybrid deepfake detection model that considers the ear biometric descriptors via enhanced RCNN (Region-Based Convolutional Neural Network). Initially, the input video is converted into frames and preprocessed through resizing, normalization, grayscale conversion, and filtering processes followed by face detection using the Viola-Jones technique. Next, a hybrid model comprising DBN (Deep Belief Network) and Bi-GRU (Bidirectional Gated Recurrent Unit) is utilized for deepfake detection based on ear descriptors. The output from the detection phase is determined through improved score-level fusion. To enhance the performance, the weights of both detection models are optimally tuned using the SU-JFO (Self-Upgraded Jellyfish Optimization method). Experimentation is conducted based on four scenarios: compression, noise, rotation, pose, and illumination on three different datasets. The performance results affirm that our proposed method outperforms traditional models such as CNN (Convolution Neural Network), SqueezeNet, LeNet, LinkNet, LSTM (Long Short-Term Memory), DFP (Deepfake Predictor) [1], and ResNext+CNN+LSTM [2] in terms of various performance metrics viz. accuracy, specificity, and precision.

Paper Structure

This paper contains 40 sections, 45 equations, 33 figures, 15 tables, 1 algorithm.

Figures (33)

  • Figure 2: Conventional RCNN architecture.
  • Figure 3: Improved RCNN architecture.
  • Figure 4: Architecture of Bi-GRU model.
  • Figure 5: Architecture of DBN model.
  • Figure 6: Flowchart of the SU-JFO algorithm.
  • ...and 28 more figures