Table of Contents
Fetching ...

Improving Synthetic Image Detection Towards Generalization: An Image Transformation Perspective

Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, Fuli Feng

TL;DR

This work identifies key generalization gaps in synthetic image detection (SID) pipelines, notably weakened artifact features from down-sampling and overfitted features from narrow augmentations, alongside heightened local pixel correlations in synthetic imagery. It proposes SAFE, a lightweight SID approach that combines crop-based preprocessing, ColorJitter and RandomRotation augmentations, and a patch-based random masking strategy, together with Discrete Wavelet Transform–based high-frequency features to preserve local cues. Across 26 generators, including both GANs and diffusion models, SAFE delivers state-of-the-art or near-state-of-the-art generalization with significantly lower computational cost, and it demonstrates strong online deployment performance. The results suggest that SID generalization benefits from training paradigms that preserve local artifacts, diversify appearances, and emphasize local-region cues over semantically rich but potentially architecture-specific features.

Abstract

With recent generative models facilitating photo-realistic image synthesis, the proliferation of synthetic images has also engendered certain negative impacts on social platforms, thereby raising an urgent imperative to develop effective detectors. Current synthetic image detection (SID) pipelines are primarily dedicated to crafting universal artifact features, accompanied by an oversight about SID training paradigm. In this paper, we re-examine the SID problem and identify two prevalent biases in current training paradigms, i.e., weakened artifact features and overfitted artifact features. Meanwhile, we discover that the imaging mechanism of synthetic images contributes to heightened local correlations among pixels, suggesting that detectors should be equipped with local awareness. In this light, we propose SAFE, a lightweight and effective detector with three simple image transformations. Firstly, for weakened artifact features, we substitute the down-sampling operator with the crop operator in image pre-processing to help circumvent artifact distortion. Secondly, for overfitted artifact features, we include ColorJitter and RandomRotation as additional data augmentations, to help alleviate irrelevant biases from color discrepancies and semantic differences in limited training samples. Thirdly, for local awareness, we propose a patch-based random masking strategy tailored for SID, forcing the detector to focus on local regions at training. Comparative experiments are conducted on an open-world dataset, comprising synthetic images generated by 26 distinct generative models. Our pipeline achieves a new state-of-the-art performance, with remarkable improvements of 4.5% in accuracy and 2.9% in average precision against existing methods. Our code is available at: https://github.com/Ouxiang-Li/SAFE.

Improving Synthetic Image Detection Towards Generalization: An Image Transformation Perspective

TL;DR

This work identifies key generalization gaps in synthetic image detection (SID) pipelines, notably weakened artifact features from down-sampling and overfitted features from narrow augmentations, alongside heightened local pixel correlations in synthetic imagery. It proposes SAFE, a lightweight SID approach that combines crop-based preprocessing, ColorJitter and RandomRotation augmentations, and a patch-based random masking strategy, together with Discrete Wavelet Transform–based high-frequency features to preserve local cues. Across 26 generators, including both GANs and diffusion models, SAFE delivers state-of-the-art or near-state-of-the-art generalization with significantly lower computational cost, and it demonstrates strong online deployment performance. The results suggest that SID generalization benefits from training paradigms that preserve local artifacts, diversify appearances, and emphasize local-region cues over semantically rich but potentially architecture-specific features.

Abstract

With recent generative models facilitating photo-realistic image synthesis, the proliferation of synthetic images has also engendered certain negative impacts on social platforms, thereby raising an urgent imperative to develop effective detectors. Current synthetic image detection (SID) pipelines are primarily dedicated to crafting universal artifact features, accompanied by an oversight about SID training paradigm. In this paper, we re-examine the SID problem and identify two prevalent biases in current training paradigms, i.e., weakened artifact features and overfitted artifact features. Meanwhile, we discover that the imaging mechanism of synthetic images contributes to heightened local correlations among pixels, suggesting that detectors should be equipped with local awareness. In this light, we propose SAFE, a lightweight and effective detector with three simple image transformations. Firstly, for weakened artifact features, we substitute the down-sampling operator with the crop operator in image pre-processing to help circumvent artifact distortion. Secondly, for overfitted artifact features, we include ColorJitter and RandomRotation as additional data augmentations, to help alleviate irrelevant biases from color discrepancies and semantic differences in limited training samples. Thirdly, for local awareness, we propose a patch-based random masking strategy tailored for SID, forcing the detector to focus on local regions at training. Comparative experiments are conducted on an open-world dataset, comprising synthetic images generated by 26 distinct generative models. Our pipeline achieves a new state-of-the-art performance, with remarkable improvements of 4.5% in accuracy and 2.9% in average precision against existing methods. Our code is available at: https://github.com/Ouxiang-Li/SAFE.
Paper Structure (24 sections, 7 equations, 12 figures, 8 tables, 1 algorithm)

This paper contains 24 sections, 7 equations, 12 figures, 8 tables, 1 algorithm.

Figures (12)

  • Figure 1: Two prevalent biases observed in current SID training paradigms. (a) Weakened artifact features: We reconstruct the real image using null-text inversion mokady2022nulltextinversioneditingreal with Stable Diffusion v1.4 StableDiffusion, ensuring they are semantically consistent. We then calculate their local correlation maps before and after down-sampling and subtract them for explicit comparison. It can be noticed that fake image exhibits stronger local correlations and the down-sampling operator indeed weakens such subtle artifacts. (b) Overfitted artifact features: We compare the logit distributions between the baseline (w/ HorizontalFlip only, left) and ours (right) for both seen and unseen generators. The monotonous application of HorizontalFlip is insufficient to alleviate overfitting to training samples, resulting in an extreme logit distribution for in-domain samples (i.e., ProGAN) and inferior generalization for out-of-domain samples (e.g., Midjourney).
  • Figure 2:
  • Figure 3: Examples of our proposed transformations in data augmentation, i.e., ColorJitter (CJ), RandomRotatioin (RR), and RandomMask (RM). In practice, these three augmentations are applied simultaneously along with HorizontalFlip.
  • Figure 4: Image pre-processing ablation. We ablate different data pre-processing operators at both training and testing on GenImage. The training includes Bilinear-based Resize (BR), Nearest-based Resize (NR), and RandomCrop (RC). The testing includes BR, NR, RC, CenterCrop (CC), and Source Image (SI), where SI indicates inference w/o any pre-processing. Our pipeline with RC (training) and CC (testing) is marked with white color.
  • Figure 5: Image augmentation ablation. We ablate the introduced data augmentation techniques, including HorizontalFlip (HF), RandomRotation (RR), RandomMask (RM), and ColorJitter (CJ), where "$+$" and "$-$" indicate w/ and w/o a specific augmentation, respectively. Our pipeline combined with all augmentations is marked with white color.
  • ...and 7 more figures