Table of Contents
Fetching ...

FerretNet: Efficient Synthetic Image Detection via Local Pixel Dependencies

Shuqiao Liang, Jian Liu, Renzhang Chen, Quanlong Guan

TL;DR

FerretNet introduces Local Pixel Dependencies (LPD) as a universal artifact representation to detect synthetic images across GANs, VAEs, and LDMs. LPD uses zero-masked median reconstruction to reveal local texture and edge disruptions, and FerretNet is a compact detector with 1.1M parameters leveraging depthwise separable and dilated convolutions. It achieves 97.1% accuracy across 22 generative models and superior efficiency (772 FPS on RTX 4090) while outperforming several lightweight baselines; it also introduces the Synthetic-Pop benchmark. The work offers a practical, model-agnostic detector that generalizes well to high-fidelity synthetic images, with potential impact on content authentication and forgery mitigation.

Abstract

The increasing realism of synthetic images generated by advanced models such as VAEs, GANs, and LDMs poses significant challenges for synthetic image detection. To address this issue, we explore two artifact types introduced during the generation process: (1) latent distribution deviations and (2) decoding-induced smoothing effects, which manifest as inconsistencies in local textures, edges, and color transitions. Leveraging local pixel dependencies (LPD) properties rooted in Markov Random Fields, we reconstruct synthetic images using neighboring pixel information to expose disruptions in texture continuity and edge coherence. Building upon LPD, we propose FerretNet, a lightweight neural network with only 1.1M parameters that delivers efficient and robust synthetic image detection. Extensive experiments demonstrate that FerretNet, trained exclusively on the 4-class ProGAN dataset, achieves an average accuracy of 97.1% on an open-world benchmark comprising 22 generative models. Our code and datasets are publicly available at https://github.com/xigua7105/FerretNet.

FerretNet: Efficient Synthetic Image Detection via Local Pixel Dependencies

TL;DR

FerretNet introduces Local Pixel Dependencies (LPD) as a universal artifact representation to detect synthetic images across GANs, VAEs, and LDMs. LPD uses zero-masked median reconstruction to reveal local texture and edge disruptions, and FerretNet is a compact detector with 1.1M parameters leveraging depthwise separable and dilated convolutions. It achieves 97.1% accuracy across 22 generative models and superior efficiency (772 FPS on RTX 4090) while outperforming several lightweight baselines; it also introduces the Synthetic-Pop benchmark. The work offers a practical, model-agnostic detector that generalizes well to high-fidelity synthetic images, with potential impact on content authentication and forgery mitigation.

Abstract

The increasing realism of synthetic images generated by advanced models such as VAEs, GANs, and LDMs poses significant challenges for synthetic image detection. To address this issue, we explore two artifact types introduced during the generation process: (1) latent distribution deviations and (2) decoding-induced smoothing effects, which manifest as inconsistencies in local textures, edges, and color transitions. Leveraging local pixel dependencies (LPD) properties rooted in Markov Random Fields, we reconstruct synthetic images using neighboring pixel information to expose disruptions in texture continuity and edge coherence. Building upon LPD, we propose FerretNet, a lightweight neural network with only 1.1M parameters that delivers efficient and robust synthetic image detection. Extensive experiments demonstrate that FerretNet, trained exclusively on the 4-class ProGAN dataset, achieves an average accuracy of 97.1% on an open-world benchmark comprising 22 generative models. Our code and datasets are publicly available at https://github.com/xigua7105/FerretNet.

Paper Structure

This paper contains 30 sections, 5 equations, 7 figures, 10 tables, 1 algorithm.

Figures (7)

  • Figure 1: The image generation process in VAEs, GANs, and LDMs can be broadly divided into two stages: obtaining the latent variable $z$, and decoding it into an image.
  • Figure 2: Local pixel dependencies (LPD) comparison between real and synthetic images. Top row: real images (COCO, LAION) and synthetic images (BigGAN, SDXL-Turbo, StyleGAN, RealVisXL-4.0). Bottom row: LPD maps derived from neighborhood-median reconstruction emphasize structural differences.
  • Figure 3: Pipeline of FerretNet: computation of local pixel median discrepancy for artifact representation, followed by lightweight detection using depthwise separable and dilated convolutions.
  • Figure 4: Examples of images generated by different models along with their corresponding text prompts. Each subfigure presents an image produced by a specific model, where the format “Model: Prompt” denotes the generating model and its input description.
  • Figure 5: LPD and Grad-CAM visualizations of real and fake images.
  • ...and 2 more figures