Table of Contents
Fetching ...

Detecting GAN-generated Imagery using Color Cues

Scott McCloskey, Michael Albright

TL;DR

This paper tackles the challenge of distinguishing GAN-generated imagery from real camera images to curb online disinformation. It analyzes the GAN generator’s color formation and normalization steps to derive two cues: depth-to-RGB color coupling and intensity-constraining normalization. Two detectors are proposed: a color-forensics approach using chromaticity histograms with an INH classifier, and a saturation-based approach using exposure-frequency features with a linear SVM. On MFC18 datasets, the saturation-based method achieves notable discrimination (AUC up to ≈0.70 for fully GAN images and ≈0.61 on mixed content), while color-based signals are weak, underscoring the value of architecture-aware forensics as GANs evolve and the need to complement artifact-based methods.

Abstract

Image forensics is an increasingly relevant problem, as it can potentially address online disinformation campaigns and mitigate problematic aspects of social media. Of particular interest, given its recent successes, is the detection of imagery produced by Generative Adversarial Networks (GANs), e.g. `deepfakes'. Leveraging large training sets and extensive computing resources, recent work has shown that GANs can be trained to generate synthetic imagery which is (in some ways) indistinguishable from real imagery. We analyze the structure of the generating network of a popular GAN implementation, and show that the network's treatment of color is markedly different from a real camera in two ways. We further show that these two cues can be used to distinguish GAN-generated imagery from camera imagery, demonstrating effective discrimination between GAN imagery and real camera images used to train the GAN.

Detecting GAN-generated Imagery using Color Cues

TL;DR

This paper tackles the challenge of distinguishing GAN-generated imagery from real camera images to curb online disinformation. It analyzes the GAN generator’s color formation and normalization steps to derive two cues: depth-to-RGB color coupling and intensity-constraining normalization. Two detectors are proposed: a color-forensics approach using chromaticity histograms with an INH classifier, and a saturation-based approach using exposure-frequency features with a linear SVM. On MFC18 datasets, the saturation-based method achieves notable discrimination (AUC up to ≈0.70 for fully GAN images and ≈0.61 on mixed content), while color-based signals are weak, underscoring the value of architecture-aware forensics as GANs evolve and the need to complement artifact-based methods.

Abstract

Image forensics is an increasingly relevant problem, as it can potentially address online disinformation campaigns and mitigate problematic aspects of social media. Of particular interest, given its recent successes, is the detection of imagery produced by Generative Adversarial Networks (GANs), e.g. `deepfakes'. Leveraging large training sets and extensive computing resources, recent work has shown that GANs can be trained to generate synthetic imagery which is (in some ways) indistinguishable from real imagery. We analyze the structure of the generating network of a popular GAN implementation, and show that the network's treatment of color is markedly different from a real camera in two ways. We further show that these two cues can be used to distinguish GAN-generated imagery from camera imagery, demonstrating effective discrimination between GAN imagery and real camera images used to train the GAN.

Paper Structure

This paper contains 12 sections, 7 equations, 7 figures.

Figures (7)

  • Figure 1: Example artifacts evident in GAN-generated imagery. Top image shows checkerboard artifacts introduced by deconvolution steps. Bottom image shows mismatched eye colors, similar to a cue used in existing forensics albanyEyes.
  • Figure 2: Example of the generator architecture from nVidia. The high-resolution image is produced from an input 'latent vector' by repeated upsampling (doubling the spatial dimensions), followed by 3x3 convolutions with leaky-ReLU activations and pixel-wise normalization. The final color image is generated by a 1x1 convolution.
  • Figure 3: (Left) The last layers of a GAN's generator collapse multiple 'depth' layers to red, green, and blue pixel values via convolutions that span the depth layers, but have limited spatial extent. (Center) The weights used for face image synthesis in nVidia to collapse 16 depth layers to red, green, and blue are plotted. (Right) By contrast, the spectral responses of real cameras' color filter arrays CanonSpectral vary from camera to camera, but have a structure which is quite different than the learned weights of a GAN.
  • Figure 4: Example images (top row) and grayscale histograms (bottom row) for two real images (left, right) and one GAN image (center) from nVidia. Whereas the real images feature regions of under- or over-exposure (left and right images, respectively), GAN images (e.g., center) lack regions of saturation even when the background is white.
  • Figure 5: ROC curves showing the performance of the saturation frequency SVM on the two GAN datasets from MFC18.
  • ...and 2 more figures