Table of Contents
Fetching ...

PUDD: Towards Robust Multi-modal Prototype-based Deepfake Detection

Alvaro Lopez Pellcier, Yi Li, Plamen Angelov

TL;DR

The paper addresses robust deepfake detection that generalizes to unseen techniques and individuals. It introduces PUDD, a prototype-based framework that learns representative prototypes from original data and uses similarity to classify videos/images while signaling unseen concepts via an m-σ rule. Key contributions include a lightweight Prototype Learning Layer, an interpretable similarity-based decision process, and demonstrated efficiency gains (2.7 s retraining) with dramatically reduced carbon emissions, while performing strongly on Celeb-DF (95.1% accuracy) and CIFAKE. The approach is modality-agnostic (video and image), transfers to upstream image classification, and supports practical deployment due to speed, interpretability, and environmental benefits.

Abstract

Deepfake techniques generate highly realistic data, making it challenging for humans to discern between actual and artificially generated images. Recent advancements in deep learning-based deepfake detection methods, particularly with diffusion models, have shown remarkable progress. However, there is a growing demand for real-world applications to detect unseen individuals, deepfake techniques, and scenarios. To address this limitation, we propose a Prototype-based Unified Framework for Deepfake Detection (PUDD). PUDD offers a detection system based on similarity, comparing input data against known prototypes for video classification and identifying potential deepfakes or previously unseen classes by analyzing drops in similarity. Our extensive experiments reveal three key findings: (1) PUDD achieves an accuracy of 95.1% on Celeb-DF, outperforming state-of-the-art deepfake detection methods; (2) PUDD leverages image classification as the upstream task during training, demonstrating promising performance in both image classification and deepfake detection tasks during inference; (3) PUDD requires only 2.7 seconds for retraining on new data and emits 10$^{5}$ times less carbon compared to the state-of-the-art model, making it significantly more environmentally friendly.

PUDD: Towards Robust Multi-modal Prototype-based Deepfake Detection

TL;DR

The paper addresses robust deepfake detection that generalizes to unseen techniques and individuals. It introduces PUDD, a prototype-based framework that learns representative prototypes from original data and uses similarity to classify videos/images while signaling unseen concepts via an m-σ rule. Key contributions include a lightweight Prototype Learning Layer, an interpretable similarity-based decision process, and demonstrated efficiency gains (2.7 s retraining) with dramatically reduced carbon emissions, while performing strongly on Celeb-DF (95.1% accuracy) and CIFAKE. The approach is modality-agnostic (video and image), transfers to upstream image classification, and supports practical deployment due to speed, interpretability, and environmental benefits.

Abstract

Deepfake techniques generate highly realistic data, making it challenging for humans to discern between actual and artificially generated images. Recent advancements in deep learning-based deepfake detection methods, particularly with diffusion models, have shown remarkable progress. However, there is a growing demand for real-world applications to detect unseen individuals, deepfake techniques, and scenarios. To address this limitation, we propose a Prototype-based Unified Framework for Deepfake Detection (PUDD). PUDD offers a detection system based on similarity, comparing input data against known prototypes for video classification and identifying potential deepfakes or previously unseen classes by analyzing drops in similarity. Our extensive experiments reveal three key findings: (1) PUDD achieves an accuracy of 95.1% on Celeb-DF, outperforming state-of-the-art deepfake detection methods; (2) PUDD leverages image classification as the upstream task during training, demonstrating promising performance in both image classification and deepfake detection tasks during inference; (3) PUDD requires only 2.7 seconds for retraining on new data and emits 10 times less carbon compared to the state-of-the-art model, making it significantly more environmentally friendly.
Paper Structure (23 sections, 2 equations, 6 figures, 4 tables)

This paper contains 23 sections, 2 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Prototype learning-based image classification with original images.
  • Figure 2: The proposed prototype learning-based framework. We extract frames from raw videos and crop them into small patches. The red lines only refer to the inference stage.
  • Figure 3: Prototype clustering visualizations.
  • Figure 4: Similarity/Density score drop in deepfake videos.
  • Figure 5: Challenging deepfakes in Celeb-DF. Black and green/red marks refer to the detection prediction from the MMtrace and PUDD, respectively.
  • ...and 1 more figures