Explainable Deepfake Video Detection using Convolutional Neural Network and CapsuleNet
Gazi Hasin Ishrak, Zalish Mahmud, MD. Zami Al Zunaed Farabe, Tahera Khanom Tinni, Tanzim Reza, Mohammad Zavid Parvez
TL;DR
This work tackles deepfake detection by pairing a CapsuleNet-based architecture with ConvLSTM and pre-trained CNN backbones to capture both spatial and temporal cues, augmented with Grad-CAM explanations for model transparency. Using the DFDC dataset, it preprocesses and labels frames, then trains a hybrid CapsuleNet-LSTM model that includes a ConvLSTM_2D layer, a Conv2D layer, primary and secondary capsule layers, and a Sequential LSTM, followed by dense layers. The study reports a validation accuracy of 88% and an AUC of 0.951 on full-frame data, with explainability via Grad-CAM showing attention to facial regions in real videos and attenuated signals in fake ones, outperforming a combined TemperNet/XceptionNet baseline. The approach demonstrates practical potential for robust, explainable deepfake detection in real-world scenarios, balancing accuracy with interpretability.
Abstract
Deepfake technology, derived from deep learning, seamlessly inserts individuals into digital media, irrespective of their actual participation. Its foundation lies in machine learning and Artificial Intelligence (AI). Initially, deepfakes served research, industry, and entertainment. While the concept has existed for decades, recent advancements render deepfakes nearly indistinguishable from reality. Accessibility has soared, empowering even novices to create convincing deepfakes. However, this accessibility raises security concerns.The primary deepfake creation algorithm, GAN (Generative Adversarial Network), employs machine learning to craft realistic images or videos. Our objective is to utilize CNN (Convolutional Neural Network) and CapsuleNet with LSTM to differentiate between deepfake-generated frames and originals. Furthermore, we aim to elucidate our model's decision-making process through Explainable AI, fostering transparent human-AI relationships and offering practical examples for real-life scenarios.
