Table of Contents
Fetching ...

Knowledge-Guided Prompt Learning for Deepfake Facial Image Detection

Hao Wang, Cheng Deng, Zhidong Zhao

TL;DR

This work tackles deepfake facial image detection by integrating prior knowledge and mitigating domain shift. It introduces Knowledge-Guided Prompt Learning (KGP), which retrieves forgery-related concepts from a large language model to form informative prompts within a CLIP framework, and Test-Time Prompt Tuning (TTP), which uses pseudo labels to adapt prompts on the test data without ground-truth labels. The method yields notable gains on the DeepFakeFaceForensics dataset, outperforming state-of-the-art CLIP-based and conventional detection methods while using few trainable parameters. The approach offers practical implications for real-world deployment and potential extensions to cross-domain anomaly detection and forgery representation learning without downsampling.

Abstract

Recent generative models demonstrate impressive performance on synthesizing photographic images, which makes humans hardly to distinguish them from pristine ones, especially on realistic-looking synthetic facial images. Previous works mostly focus on mining discriminative artifacts from vast amount of visual data. However, they usually lack the exploration of prior knowledge and rarely pay attention to the domain shift between training categories (e.g., natural and indoor objects) and testing ones (e.g., fine-grained human facial images), resulting in unsatisfactory detection performance. To address these issues, we propose a novel knowledge-guided prompt learning method for deepfake facial image detection. Specifically, we retrieve forgery-related prompts from large language models as expert knowledge to guide the optimization of learnable prompts. Besides, we elaborate test-time prompt tuning to alleviate the domain shift, achieving significant performance improvement and boosting the application in real-world scenarios. Extensive experiments on DeepFakeFaceForensics dataset show that our proposed approach notably outperforms state-of-the-art methods.

Knowledge-Guided Prompt Learning for Deepfake Facial Image Detection

TL;DR

This work tackles deepfake facial image detection by integrating prior knowledge and mitigating domain shift. It introduces Knowledge-Guided Prompt Learning (KGP), which retrieves forgery-related concepts from a large language model to form informative prompts within a CLIP framework, and Test-Time Prompt Tuning (TTP), which uses pseudo labels to adapt prompts on the test data without ground-truth labels. The method yields notable gains on the DeepFakeFaceForensics dataset, outperforming state-of-the-art CLIP-based and conventional detection methods while using few trainable parameters. The approach offers practical implications for real-world deployment and potential extensions to cross-domain anomaly detection and forgery representation learning without downsampling.

Abstract

Recent generative models demonstrate impressive performance on synthesizing photographic images, which makes humans hardly to distinguish them from pristine ones, especially on realistic-looking synthetic facial images. Previous works mostly focus on mining discriminative artifacts from vast amount of visual data. However, they usually lack the exploration of prior knowledge and rarely pay attention to the domain shift between training categories (e.g., natural and indoor objects) and testing ones (e.g., fine-grained human facial images), resulting in unsatisfactory detection performance. To address these issues, we propose a novel knowledge-guided prompt learning method for deepfake facial image detection. Specifically, we retrieve forgery-related prompts from large language models as expert knowledge to guide the optimization of learnable prompts. Besides, we elaborate test-time prompt tuning to alleviate the domain shift, achieving significant performance improvement and boosting the application in real-world scenarios. Extensive experiments on DeepFakeFaceForensics dataset show that our proposed approach notably outperforms state-of-the-art methods.
Paper Structure (10 sections, 6 equations, 3 figures, 3 tables)

This paper contains 10 sections, 6 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Existing approaches mostly focus on forgery artifacts extraction. However, they lack the exploration of expertise and pay little attention to the domain shift between training categories and testing ones.
  • Figure 2: The overall framework, which consists of knowledge-guided prompt learning and test-time prompt tuning. The former elicit prior knowledge from a large language model to construct more meaningful prompts. The latter obtains pseudo label for testing data and then tunes prompt to alleviate domain shift.
  • Figure 3: Detection performance with different value of each hyper-parameter.