Identity-Driven Multimedia Forgery Detection via Reference Assistance

Junhao Xu; Jingjing Chen; Xue Song; Feng Han; Haijun Shan; Yugang Jiang

Identity-Driven Multimedia Forgery Detection via Reference Assistance

Junhao Xu, Jingjing Chen, Xue Song, Feng Han, Haijun Shan, Yugang Jiang

TL;DR

The paper tackles identity-driven multimedia forgery detection by introducing IDForge, a large-scale dataset with 463,576 total video shots (249,138 forged and 214,438 pristine) across 54 celebrities and 11 composite forgery types, plus a substantial reference set for identity grounding. It proposes R-MFDN, a three-branch multimodal network that incorporates identity-aware and cross-modal contrastive learning to exploit identity priors and inter-modal inconsistencies, achieving state-of-the-art detection and multi-label forgery-type classification on IDForge. Extensive experiments, including ablations, demonstrate that identity-aware training and cross-modal cues substantially boost performance, with multimodal fusion outperforming unimodal baselines. The work provides a valuable resource for robust multimedia forgery detection and highlights the practical impact of leveraging reference identity information for verifying authenticity in real-world scenarios.

Abstract

Recent advancements in "deepfake" techniques have paved the way for generating various media forgeries. In response to the potential hazards of these media forgeries, many researchers engage in exploring detection methods, increasing the demand for high-quality media forgery datasets. Despite this, existing datasets have certain limitations. Firstly, most datasets focus on manipulating visual modality and usually lack diversity, as only a few forgery approaches are considered. Secondly, the quality of media is often inadequate in clarity and naturalness. Meanwhile, the size of the dataset is also limited. Thirdly, it is commonly observed that real-world forgeries are motivated by identity, yet the identity information of the individuals portrayed in these forgeries within existing datasets remains under-explored. For detection, identity information could be an essential clue to boost performance. Moreover, official media concerning relevant identities on the Internet can serve as prior knowledge, aiding both the audience and forgery detectors in determining the true identity. Therefore, we propose an identity-driven multimedia forgery dataset, IDForge, which contains 249,138 video shots sourced from 324 wild videos of 54 celebrities collected from the Internet. The fake video shots involve 9 types of manipulation across visual, audio, and textual modalities. Additionally, IDForge provides extra 214,438 real video shots as a reference set for the 54 celebrities. Correspondingly, we propose the Reference-assisted Multimodal Forgery Detection Network (R-MFDN), aiming at the detection of deepfake videos. Through extensive experiments on the proposed dataset, we demonstrate the effectiveness of R-MFDN on the multimedia detection task.

Identity-Driven Multimedia Forgery Detection via Reference Assistance

TL;DR

Abstract

Paper Structure (19 sections, 8 equations, 5 figures, 4 tables)

This paper contains 19 sections, 8 equations, 5 figures, 4 tables.

INTRODUCTION
RELATED WORK
Identity Forgery
Media Forgery Detection
IDForge Dataset
Pristine Data Colletion
Individual-targeted Manipulation
Statistic and Comparisions
Ethics Statement
METHODOLOGY
Multi-modal Feature Learning
Cross-modal Contrastive Learning
Identity-aware Contrastive Learning
Forgery Classification
EXPERIMENTS
...and 4 more sections

Figures (5)

Figure 1: The proposed IDForge dataset involves manipulation across modalities to create false identities, including techniques such as text manipulation, voice cloning, lip-syncing, and face-swapping, etc.
Figure 2: Statistical information of the IDForge dataset.
Figure 3: Feature visualization. VideoMAE DBLP:journals/corr/abs-2203-12602, Xception Xception, and BERT BERT are utilized to extract spectrogram, textual and visual features, respectively. Then t-SNEtsne is used for dimensionality reduction and visualization.
Figure 4: Network overview. The proposed R-MFDN introduces identity-aware contrastive learning to learn identity-sensitive features and captures cross-modal inconsistency through cross-modal contrastive learning.
Figure 5: Comparison between w/o identity and R-MFDN.

Identity-Driven Multimedia Forgery Detection via Reference Assistance

TL;DR

Abstract

Identity-Driven Multimedia Forgery Detection via Reference Assistance

Authors

TL;DR

Abstract

Table of Contents

Figures (5)