Table of Contents
Fetching ...

ForensicFlow: A Tri-Modal Adaptive Network for Robust Deepfake Detection

Mohammad Romani

TL;DR

Deepfakes increasingly evade single-branch detectors, prompting the need for robust, multi-domain forensic analysis. ForensicFlow integrates three specialized streams—RGB-Spatial, Texture-Microscopic, and Frequency Analysis—with temporal attention and adaptive fusion to jointly exploit appearance, texture, and spectral cues. Trained with progressive unfreezing and Focal Loss, it achieves an AUC of 0.9752 and F1 of 0.9408 on Celeb-DF(v2), while Grad-CAM analyses confirm focus on genuine manipulation regions. The work demonstrates that cross-domain fusion, temporal prioritization, and interpretable signals can substantially improve resilience to evolving deepfake techniques and support practical forensic deployment.

Abstract

Modern deepfakes evade detection by leaving subtle, domain-speci c artifacts that single branch networks miss. ForensicFlow addresses this by fusing evidence across three forensic dimensions: global visual inconsistencies (via ConvNeXt-tiny), ne-grained texture anomalies (via Swin Transformer-tiny), and spectral noise patterns (via CNN with channel attention). Our attention-based temporal pooling dynamically prioritizes high-evidence frames, while adaptive fusion weights each branch according to forgery type. Trained on CelebDF(v2) with Focal Loss, the model achieves AUC 0.9752, F1 0.9408, and accuracy 0.9208 out performing single-stream detectors. Ablation studies con rm branch synergy, and Grad-CAM visualizations validate focus on genuine manipulation regions (e.g., facial boundaries). This multi-domain fusion strategy establishes robustness against increasingly sophisticated forgeries.

ForensicFlow: A Tri-Modal Adaptive Network for Robust Deepfake Detection

TL;DR

Deepfakes increasingly evade single-branch detectors, prompting the need for robust, multi-domain forensic analysis. ForensicFlow integrates three specialized streams—RGB-Spatial, Texture-Microscopic, and Frequency Analysis—with temporal attention and adaptive fusion to jointly exploit appearance, texture, and spectral cues. Trained with progressive unfreezing and Focal Loss, it achieves an AUC of 0.9752 and F1 of 0.9408 on Celeb-DF(v2), while Grad-CAM analyses confirm focus on genuine manipulation regions. The work demonstrates that cross-domain fusion, temporal prioritization, and interpretable signals can substantially improve resilience to evolving deepfake techniques and support practical forensic deployment.

Abstract

Modern deepfakes evade detection by leaving subtle, domain-speci c artifacts that single branch networks miss. ForensicFlow addresses this by fusing evidence across three forensic dimensions: global visual inconsistencies (via ConvNeXt-tiny), ne-grained texture anomalies (via Swin Transformer-tiny), and spectral noise patterns (via CNN with channel attention). Our attention-based temporal pooling dynamically prioritizes high-evidence frames, while adaptive fusion weights each branch according to forgery type. Trained on CelebDF(v2) with Focal Loss, the model achieves AUC 0.9752, F1 0.9408, and accuracy 0.9208 out performing single-stream detectors. Ablation studies con rm branch synergy, and Grad-CAM visualizations validate focus on genuine manipulation regions (e.g., facial boundaries). This multi-domain fusion strategy establishes robustness against increasingly sophisticated forgeries.

Paper Structure

This paper contains 23 sections, 5 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: ForensicFlow architecture. Three parallel forensic streams process video frames, with temporal attention and adaptive fusion enabling robust deepfake detection.
  • Figure 2: Grad-CAM visualization showing attention maps for real (a) and deepfake (b) samples. Warm colors (yellow/red) indicate regions most influential for the forgery decision.
  • Figure 3: Training dynamics showing loss curves with vertical lines marking progressive unfreezing stages. The stable validation loss despite aggressive unfreezing confirms architectural robustness.