Table of Contents
Fetching ...

UGAD: Universal Generative AI Detector utilizing Frequency Fingerprints

Inzamamul Alam, Muhammad Shahid Muneer, Simon S. Woo

TL;DR

UGAD tackles the challenge of distinguishing real images from AI-generated fakes by integrating frequency-domain analysis in the YCbCr color space with a novel Radial Integral Operation (RIO) and a Spatial Fourier Unit (SFU) that together extract robust spectral-spatial features. A ResNet152 backbone fuses these features for classification, yielding superior accuracy and AUC across diverse GAN and diffusion-model datasets. Extensive ablations reveal the critical roles of YCbCr preprocessing, split-shift SFU operations, and the combined RIO+SFU pipeline, with practical inference-time feasibility (~400 ms). The approach is demonstrated on a large, heterogeneous dataset and deployed in a live detection system, underscoring its potential for real-world safe-guarding against AI-generated misinformation.

Abstract

In the wake of a fabricated explosion image at the Pentagon, an ability to discern real images from fake counterparts has never been more critical. Our study introduces a novel multi-modal approach to detect AI-generated images amidst the proliferation of new generation methods such as Diffusion models. Our method, UGAD, encompasses three key detection steps: First, we transform the RGB images into YCbCr channels and apply an Integral Radial Operation to emphasize salient radial features. Secondly, the Spatial Fourier Extraction operation is used for a spatial shift, utilizing a pre-trained deep learning network for optimal feature extraction. Finally, the deep neural network classification stage processes the data through dense layers using softmax for classification. Our approach significantly enhances the accuracy of differentiating between real and AI-generated images, as evidenced by a 12.64% increase in accuracy and 28.43% increase in AUC compared to existing state-of-the-art methods.

UGAD: Universal Generative AI Detector utilizing Frequency Fingerprints

TL;DR

UGAD tackles the challenge of distinguishing real images from AI-generated fakes by integrating frequency-domain analysis in the YCbCr color space with a novel Radial Integral Operation (RIO) and a Spatial Fourier Unit (SFU) that together extract robust spectral-spatial features. A ResNet152 backbone fuses these features for classification, yielding superior accuracy and AUC across diverse GAN and diffusion-model datasets. Extensive ablations reveal the critical roles of YCbCr preprocessing, split-shift SFU operations, and the combined RIO+SFU pipeline, with practical inference-time feasibility (~400 ms). The approach is demonstrated on a large, heterogeneous dataset and deployed in a live detection system, underscoring its potential for real-world safe-guarding against AI-generated misinformation.

Abstract

In the wake of a fabricated explosion image at the Pentagon, an ability to discern real images from fake counterparts has never been more critical. Our study introduces a novel multi-modal approach to detect AI-generated images amidst the proliferation of new generation methods such as Diffusion models. Our method, UGAD, encompasses three key detection steps: First, we transform the RGB images into YCbCr channels and apply an Integral Radial Operation to emphasize salient radial features. Secondly, the Spatial Fourier Extraction operation is used for a spatial shift, utilizing a pre-trained deep learning network for optimal feature extraction. Finally, the deep neural network classification stage processes the data through dense layers using softmax for classification. Our approach significantly enhances the accuracy of differentiating between real and AI-generated images, as evidenced by a 12.64% increase in accuracy and 28.43% increase in AUC compared to existing state-of-the-art methods.
Paper Structure (17 sections, 17 equations, 5 figures, 4 tables)

This paper contains 17 sections, 17 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Overview of our approach, UGAD: Stage 1 through 4 involves RGB to YCbCr conversion for luminance and chrominance extraction, followed by FFT for spectral analysis and Radial Integral Operation (RIO). And, Stage 5 through 10 show Spatial Fourier Unit (SFU) processes including splitting, concatenation, Spatial Feature Extraction (SFE), batch normalization, and spatial shifting as a multi-modal architecture. The input image is represented in 3D dimensions of height $(H)$, width $(W)$, and channels $(C)$. Finally, Stage 12 is the fusion of RIO from Stage 4 with ResNet architecture from Stage 11.
  • Figure 2: Each row displays images generated by Diffusion Models (DM) sourced from various online platforms.
  • Figure 3: The efficacy of augmentation methods on detector performance is evaluated. All detectors are trained using ProGAN and assessed on alternative generators, with their respective accuracies presented. Augmentation generally enhances performance, although noteworthy exceptions, such as MidjourneyV5, are observed.
  • Figure 4: Spectrum Analysis Graph for YCbCr Images after Applying RIO in 10K different images. The graph shows the number of radii on the X-axis and the power spectrum intensity on the Y-axis. This illustrates the power spectrum intensity is different for each generated method in DMs and GANs approaches. The Rigid lines represent the mean for 10K images.
  • Figure 5: Samples for real-life image's inference on FaceSwap method.