Table of Contents
Fetching ...

C2P-CLIP: Injecting Category Common Prompt in CLIP to Enhance Generalization in Deepfake Detection

Chuangchuang Tan, Renshuai Tao, Huan Liu, Guanghua Gu, Baoyuan Wu, Yao Zhao, Yunchao Wei

TL;DR

This work investigates why CLIP-powered detectors generalize to unseen deepfakes and introduces C2P-CLIP, which injects category concepts into the image encoder through caption enhancement using category prompts. By training with a contrastive objective and a classification loss while freezing the text encoder and using LoRA on the image encoder, the method achieves substantial generalization gains without adding test-time parameters. Empirical results on UniversalFakeDetect and GenImage across 20 generation models show state-of-the-art or near-state-of-the-art performance and robust cross-model transfer, supported by qualitative analyses of logit distributions. The approach provides both a practical, parameter-efficient improvement for universal deepfake detection and insight into how CLIP features drive detection through concept-level matching rather than explicit real/fake semantics.

Abstract

This work focuses on AIGC detection to develop universal detectors capable of identifying various types of forgery images. Recent studies have found large pre-trained models, such as CLIP, are effective for generalizable deepfake detection along with linear classifiers. However, two critical issues remain unresolved: 1) understanding why CLIP features are effective on deepfake detection through a linear classifier; and 2) exploring the detection potential of CLIP. In this study, we delve into the underlying mechanisms of CLIP's detection capabilities by decoding its detection features into text and performing word frequency analysis. Our finding indicates that CLIP detects deepfakes by recognizing similar concepts (Fig. \ref{fig:fig1} a). Building on this insight, we introduce Category Common Prompt CLIP, called C2P-CLIP, which integrates the category common prompt into the text encoder to inject category-related concepts into the image encoder, thereby enhancing detection performance (Fig. \ref{fig:fig1} b). Our method achieves a 12.41\% improvement in detection accuracy compared to the original CLIP, without introducing additional parameters during testing. Comprehensive experiments conducted on two widely-used datasets, encompassing 20 generation models, validate the efficacy of the proposed method, demonstrating state-of-the-art performance. The code is available at \url{https://github.com/chuangchuangtan/C2P-CLIP-DeepfakeDetection}

C2P-CLIP: Injecting Category Common Prompt in CLIP to Enhance Generalization in Deepfake Detection

TL;DR

This work investigates why CLIP-powered detectors generalize to unseen deepfakes and introduces C2P-CLIP, which injects category concepts into the image encoder through caption enhancement using category prompts. By training with a contrastive objective and a classification loss while freezing the text encoder and using LoRA on the image encoder, the method achieves substantial generalization gains without adding test-time parameters. Empirical results on UniversalFakeDetect and GenImage across 20 generation models show state-of-the-art or near-state-of-the-art performance and robust cross-model transfer, supported by qualitative analyses of logit distributions. The approach provides both a practical, parameter-efficient improvement for universal deepfake detection and insight into how CLIP features drive detection through concept-level matching rather than explicit real/fake semantics.

Abstract

This work focuses on AIGC detection to develop universal detectors capable of identifying various types of forgery images. Recent studies have found large pre-trained models, such as CLIP, are effective for generalizable deepfake detection along with linear classifiers. However, two critical issues remain unresolved: 1) understanding why CLIP features are effective on deepfake detection through a linear classifier; and 2) exploring the detection potential of CLIP. In this study, we delve into the underlying mechanisms of CLIP's detection capabilities by decoding its detection features into text and performing word frequency analysis. Our finding indicates that CLIP detects deepfakes by recognizing similar concepts (Fig. \ref{fig:fig1} a). Building on this insight, we introduce Category Common Prompt CLIP, called C2P-CLIP, which integrates the category common prompt into the text encoder to inject category-related concepts into the image encoder, thereby enhancing detection performance (Fig. \ref{fig:fig1} b). Our method achieves a 12.41\% improvement in detection accuracy compared to the original CLIP, without introducing additional parameters during testing. Comprehensive experiments conducted on two widely-used datasets, encompassing 20 generation models, validate the efficacy of the proposed method, demonstrating state-of-the-art performance. The code is available at \url{https://github.com/chuangchuangtan/C2P-CLIP-DeepfakeDetection}
Paper Structure (17 sections, 7 equations, 5 figures, 3 tables)

This paper contains 17 sections, 7 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Category Common Prompt CLIP. (a) To investigate the mechanism by which CLIP detects deepfakes, we decode the detection feature into text. Here, the detection features refer to the image features transformed by the linear classifier parameters. Our analysis of these texts reveals that the detection capability arises from the matching of similar concepts. In reality, CLIP does not comprehend "real" and "fake", but rather identifies analogous concepts. (b) Building on this insight, we propose a method to enhance the generalization capability of image encoders by introducing a category common prompt. This approach injects manually specified category concepts into the image encoder, aiming to improve its detection performance.
  • Figure 2: Analyzing CLIP Features in Deepfake Detection. (a) Decoding Image Feature to Text. We employ ClipCap mokady2021clipcap to decode the image feature $v$ to text. (b) Decoding Detection Feature to Text. To discern the specific information within the image features that contribute to classification, we decode the detection features into text. The detection features are defined as the combination of image features $v$ and linear classifier $fc$ parameters: $v*fc.weight+fc.bias$. Notice the linear mapping between image features and detection features. Notably, the decoded text bears no direct relevance to the original image content. (c) T-SNE visualization of Detection Feature. We use T-SNE to visualize the detection features from the StyleGAN dataset and decode the textual representations of the three clustering centers within each subset.
  • Figure 3: Word Frequency Analysis on Various Sources. We conduct a word frequency analysis on the text decoded from detection features of both the training set (ProGAN) and the unseen test source (StyleGAN). The top 15 words are shown in the graph. The analysis reveals significant differences in word frequencies between the training and test sets. Notably, certain words present in the test set also appear in the training set. For instance, the word 'women' shows substantial frequency variation between (a) and (c). This observation supports the conclusion that CLIP achieves generalizable forgery detection by matching similar concepts or groups of concepts.
  • Figure 4: Architecture of C2P-CLIP for Generalizable Deepfake Detection. (a) Caption Generation and Enhancement. We obtain the caption of images using ClipCap, and leverage category common prompts to enhance those text. In this study, we adopt (Trump, Biden), (Deepfake, Camera) as the category common prompts. (b) Concept Injection (Training stage). We use the text-image pair to train the Lora layers and classifier by contrastive loss and classification loss. (c) Detection (Testing stage). Only image encoder and classifier are utilized to perform detection.
  • Figure 5: Logit distributions of extracted forgery features. We compare the baseline UniFD and our C2P-CLIP. A total of four testing GANs and diffusion models are considered, including ProGAN, StyleGAN, Deepfake, LDM, and DALLE.