Table of Contents
Fetching ...

Towards Interactive Deepfake Analysis

Lixiong Qin, Ning Jiang, Yang Zhang, Yuhan Qiu, Dingheng Zeng, Jiani Hu, Weihong Deng

TL;DR

The paper addresses the limitation of traditional discriminative deepfake analysis by proposing an interactive DFA framework powered by instruction-tuned multi-modal LLMs. It introduces DFA-Instruct, a large instruction-following dataset, and DFA-Bench, a comprehensive benchmark, enabling evaluation of DF-D, DF-C, AD, and FC capabilities. The authors implement DFA-GPT, a LoRA-tuned system combining CLIP-based vision encoding with a Vicuna LLM, to demonstrate interactive deepfake analysis under resource constraints. Key contributions include public datasets, a benchmark, and a strong baseline that outperforms vision-only models in detection and classification while adding artifact description and free conversation abilities. This work lays the groundwork for integrating interactive, language-driven reasoning into forensic deepfake analysis and invites broader community development.

Abstract

Existing deepfake analysis methods are primarily based on discriminative models, which significantly limit their application scenarios. This paper aims to explore interactive deepfake analysis by performing instruction tuning on multi-modal large language models (MLLMs). This will face challenges such as the lack of datasets and benchmarks, and low training efficiency. To address these issues, we introduce (1) a GPT-assisted data construction process resulting in an instruction-following dataset called DFA-Instruct, (2) a benchmark named DFA-Bench, designed to comprehensively evaluate the capabilities of MLLMs in deepfake detection, deepfake classification, and artifact description, and (3) construct an interactive deepfake analysis system called DFA-GPT, as a strong baseline for the community, with the Low-Rank Adaptation (LoRA) module. The dataset and code will be made available at https://github.com/lxq1000/DFA-Instruct to facilitate further research.

Towards Interactive Deepfake Analysis

TL;DR

The paper addresses the limitation of traditional discriminative deepfake analysis by proposing an interactive DFA framework powered by instruction-tuned multi-modal LLMs. It introduces DFA-Instruct, a large instruction-following dataset, and DFA-Bench, a comprehensive benchmark, enabling evaluation of DF-D, DF-C, AD, and FC capabilities. The authors implement DFA-GPT, a LoRA-tuned system combining CLIP-based vision encoding with a Vicuna LLM, to demonstrate interactive deepfake analysis under resource constraints. Key contributions include public datasets, a benchmark, and a strong baseline that outperforms vision-only models in detection and classification while adding artifact description and free conversation abilities. This work lays the groundwork for integrating interactive, language-driven reasoning into forensic deepfake analysis and invites broader community development.

Abstract

Existing deepfake analysis methods are primarily based on discriminative models, which significantly limit their application scenarios. This paper aims to explore interactive deepfake analysis by performing instruction tuning on multi-modal large language models (MLLMs). This will face challenges such as the lack of datasets and benchmarks, and low training efficiency. To address these issues, we introduce (1) a GPT-assisted data construction process resulting in an instruction-following dataset called DFA-Instruct, (2) a benchmark named DFA-Bench, designed to comprehensively evaluate the capabilities of MLLMs in deepfake detection, deepfake classification, and artifact description, and (3) construct an interactive deepfake analysis system called DFA-GPT, as a strong baseline for the community, with the Low-Rank Adaptation (LoRA) module. The dataset and code will be made available at https://github.com/lxq1000/DFA-Instruct to facilitate further research.
Paper Structure (12 sections, 2 equations, 4 figures, 3 tables)

This paper contains 12 sections, 2 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: An interactive deepfake analysis system. Questions representing the four abilities are shown separately. Responses are generated by our DFA-GPT.
  • Figure 2: Data construction process.
  • Figure 3: Statistics of our proposed DFA-Instruct.
  • Figure 4: The overall architecture of our DFA-GPT.