Table of Contents
Fetching ...

FAMSeC: A Few-shot-sample-based General AI-generated Image Detection Method

Juncong Xu, Yang Yang, Han Fang, Honggu Liu, Weiming Zhang

TL;DR

FAMSeC, a general AI-generated image detection method based on LoRA-based Forgery Awareness Module and Semantic feature-guided Contrastive learning strategy, is proposed, making the FAM focus more on the differences between real/fake image than on the features of the samples themselves.

Abstract

The explosive growth of generative AI has saturated the internet with AI-generated images, raising security concerns and increasing the need for reliable detection methods. The primary requirement for such detection is generalizability, typically achieved by training on numerous fake images from various models. However, practical limitations, such as closed-source models and restricted access, often result in limited training samples. Therefore, training a general detector with few-shot samples is essential for modern detection mechanisms. To address this challenge, we propose FAMSeC, a general AI-generated image detection method based on LoRA-based Forgery Awareness Module and Semantic feature-guided Contrastive learning strategy. To effectively learn from limited samples and prevent overfitting, we developed a Forgery Awareness Module (FAM) based on LoRA, maintaining the generalization of pre-trained features. Additionally, to cooperate with FAM, we designed a Semantic feature-guided Contrastive learning strategy (SeC), making the FAM focus more on the differences between real/fake image than on the features of the samples themselves. Experiments show that FAMSeC outperforms state-of-the-art method, enhancing classification accuracy by 14.55% with just 0.56% of the training samples.

FAMSeC: A Few-shot-sample-based General AI-generated Image Detection Method

TL;DR

FAMSeC, a general AI-generated image detection method based on LoRA-based Forgery Awareness Module and Semantic feature-guided Contrastive learning strategy, is proposed, making the FAM focus more on the differences between real/fake image than on the features of the samples themselves.

Abstract

The explosive growth of generative AI has saturated the internet with AI-generated images, raising security concerns and increasing the need for reliable detection methods. The primary requirement for such detection is generalizability, typically achieved by training on numerous fake images from various models. However, practical limitations, such as closed-source models and restricted access, often result in limited training samples. Therefore, training a general detector with few-shot samples is essential for modern detection mechanisms. To address this challenge, we propose FAMSeC, a general AI-generated image detection method based on LoRA-based Forgery Awareness Module and Semantic feature-guided Contrastive learning strategy. To effectively learn from limited samples and prevent overfitting, we developed a Forgery Awareness Module (FAM) based on LoRA, maintaining the generalization of pre-trained features. Additionally, to cooperate with FAM, we designed a Semantic feature-guided Contrastive learning strategy (SeC), making the FAM focus more on the differences between real/fake image than on the features of the samples themselves. Experiments show that FAMSeC outperforms state-of-the-art method, enhancing classification accuracy by 14.55% with just 0.56% of the training samples.

Paper Structure

This paper contains 16 sections, 6 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: The framework of our proposed FAMSeC. During the training phase, we use two CLIP:ViT to perform semantic feature-guided contrastive learning (SeC). One CLIP:ViT with fixed parameters is used to extract semantically rich features to guide the contrastive learning, while the other CLIP:ViT acts as a feature extractor enhanced by a LoRA-based forgery awareness module (FAM) to learn the differences between real and fake images. During the testing phase, the features of the input image, extracted by the feature extractor, are compared with the features of real and fake images to measure the distance and derive the prediction results.
  • Figure 2: Diagram of the LoRA-based forgery awareness module (FAM). The LoRA is applied to the $query$, $key$, $value$, and $output$ matrices of the multi-head attention modules in the last 12 ViT blocks of CLIP:ViT-L/14.
  • Figure 3: The t-SNE visualization of the feature space for the pretrained CLIP:ViT-L/14 and our FAMSeC.
  • Figure 4: The detection accuracy of our FAMSeC, UniFD unifd, and CNNDet wang across three cross-model datasets with different training sample sizes. Note that all models are trained using the training set from the ForenSynthst wang dataset.