Table of Contents
Fetching ...

Forensics Adapter: Unleashing CLIP for Generalizable Face Forgery Detection

Xinjie Cui, Yuezun Li, Delong Zhu, Jiaran Zhou, Junyu Dong, Siwei Lyu

TL;DR

This work introduces Forensics Adapter, a lightweight adapter placed alongside CLIP to learn task-specific forgery traces—blending boundaries—while employing an interaction strategy that guides CLIP toward forgery-relevant knowledge. With only 5.7M trainable parameters, the adapter achieves substantial cross-dataset gains (about 7% AUC on average) across five standard datasets. An extended version, Forensics Adapter++, incorporates a forgery-aware prompt learning scheme to leverage textual modality, yielding an additional ~1.3% improvement. Together, the methods establish a strong, scalable baseline for CLIP-based face forgery detection, demonstrating strong generalization and robust performance across diverse forgery distributions.

Abstract

We describe Forensics Adapter, an adapter network designed to transform CLIP into an effective and generalizable face forgery detector. Although CLIP is highly versatile, adapting it for face forgery detection is non-trivial as forgery-related knowledge is entangled with a wide range of unrelated knowledge. Existing methods treat CLIP merely as a feature extractor, lacking task-specific adaptation, which limits their effectiveness. To address this, we introduce an adapter to learn face forgery traces -- the blending boundaries unique to forged faces, guided by task-specific objectives. Then we enhance the CLIP visual tokens with a dedicated interaction strategy that communicates knowledge across CLIP and the adapter. Since the adapter is alongside CLIP, its versatility is highly retained, naturally ensuring strong generalizability in face forgery detection. With only 5.7M trainable parameters, our method achieves a significant performance boost, improving by approximately 7% on average across five standard datasets. Additionally, we describe Forensics Adapter++, an extended method that incorporates textual modality via a newly proposed forgery-aware prompt learning strategy. This extension leads to a further 1.3% performance boost over the original Forensics Adapter. We believe the proposed methods can serve as a baseline for future CLIP-based face forgery detection methods. The codes have been released at https://github.com/OUC-VAS/ForensicsAdapter.

Forensics Adapter: Unleashing CLIP for Generalizable Face Forgery Detection

TL;DR

This work introduces Forensics Adapter, a lightweight adapter placed alongside CLIP to learn task-specific forgery traces—blending boundaries—while employing an interaction strategy that guides CLIP toward forgery-relevant knowledge. With only 5.7M trainable parameters, the adapter achieves substantial cross-dataset gains (about 7% AUC on average) across five standard datasets. An extended version, Forensics Adapter++, incorporates a forgery-aware prompt learning scheme to leverage textual modality, yielding an additional ~1.3% improvement. Together, the methods establish a strong, scalable baseline for CLIP-based face forgery detection, demonstrating strong generalization and robust performance across diverse forgery distributions.

Abstract

We describe Forensics Adapter, an adapter network designed to transform CLIP into an effective and generalizable face forgery detector. Although CLIP is highly versatile, adapting it for face forgery detection is non-trivial as forgery-related knowledge is entangled with a wide range of unrelated knowledge. Existing methods treat CLIP merely as a feature extractor, lacking task-specific adaptation, which limits their effectiveness. To address this, we introduce an adapter to learn face forgery traces -- the blending boundaries unique to forged faces, guided by task-specific objectives. Then we enhance the CLIP visual tokens with a dedicated interaction strategy that communicates knowledge across CLIP and the adapter. Since the adapter is alongside CLIP, its versatility is highly retained, naturally ensuring strong generalizability in face forgery detection. With only 5.7M trainable parameters, our method achieves a significant performance boost, improving by approximately 7% on average across five standard datasets. Additionally, we describe Forensics Adapter++, an extended method that incorporates textual modality via a newly proposed forgery-aware prompt learning strategy. This extension leads to a further 1.3% performance boost over the original Forensics Adapter. We believe the proposed methods can serve as a baseline for future CLIP-based face forgery detection methods. The codes have been released at https://github.com/OUC-VAS/ForensicsAdapter.

Paper Structure

This paper contains 17 sections, 11 equations, 5 figures, 17 tables.

Figures (5)

  • Figure 1: (a) Exisiting CLIP-based methods. (b) The proposed Forensics Adapter, achieving the best performance compared with several state-of-the-arts on CDF-v1 celeb, CDF-v2 celeb, DFDC dfdc, DFDCP dfdcp, and DFD dfd datasets.
  • Figure 2: Pipeline of the proposed Forensics Adapter in training. The top stream denotes CLIP and the bottom stream corresponds adapter. See text for details.
  • Figure 3: Pipeline of Forensics Adapter++. The upper part illustrates the generation of forgery-aware prompts and masks. The lower part shows the training and testing process. Note that the textual modality is only used in training phase, which serves as auxiliary to improve effectiveness of visual modality. This diagram highlights the extension on textual modality and leave other components consistent with the original Forensics Adapter.
  • Figure 4: Robustness Analysis. Our method is compared with CLIP clip, IID iid, and LSDA lsda across five levels of six particular types of perturbations in video-level AUC.
  • Figure 5: T-SNE Visualizations. The left figure compares the feature distributions of the ForAda method and CLIP, while the right figure presents the corresponding comparison between the ForAda++ method and CLIP.