Table of Contents
Fetching ...

Stay-Positive: A Case for Ignoring Real Image Features in Fake Image Detection

Anirudh Sundara Rajan, Yong Jae Lee

TL;DR

The paper addresses the vulnerability of fake image detectors to spurious real-image features that degrade robustness and generalization. It proposes Stay-Positive, a method that retrains only the detector's last layer with non-negative weights, forcing decisions to hinge on generator-specific artifacts and ignoring real-image cues. Empirical results show improved generalization across post-processing, downsampling, and unseen generators, and notably better detection of partially inpainted real images, with ablations validating the last-layer retraining design. This approach enhances robustness in practical forensic settings and suggests a general principle for media forensics: focus on fake artifacts and minimize reliance on real-distribution features, with potential extensions to audio and video.

Abstract

Detecting AI generated images is a challenging yet essential task. A primary difficulty arises from the detectors tendency to rely on spurious patterns, such as compression artifacts, which can influence its decisions. These issues often stem from specific patterns that the detector associates with the real data distribution, making it difficult to isolate the actual generative traces. We argue that an image should be classified as fake if and only if it contains artifacts introduced by the generative model. Based on this premise, we propose Stay Positive, an algorithm designed to constrain the detectors focus to generative artifacts while disregarding those associated with real data. Experimental results demonstrate that detectors trained with Stay Positive exhibit reduced susceptibility to spurious correlations, leading to improved generalization and robustness to post processing. Additionally, unlike detectors that associate artifacts with real images, those that focus purely on fake artifacts are better at detecting inpainted real images.

Stay-Positive: A Case for Ignoring Real Image Features in Fake Image Detection

TL;DR

The paper addresses the vulnerability of fake image detectors to spurious real-image features that degrade robustness and generalization. It proposes Stay-Positive, a method that retrains only the detector's last layer with non-negative weights, forcing decisions to hinge on generator-specific artifacts and ignoring real-image cues. Empirical results show improved generalization across post-processing, downsampling, and unseen generators, and notably better detection of partially inpainted real images, with ablations validating the last-layer retraining design. This approach enhances robustness in practical forensic settings and suggests a general principle for media forensics: focus on fake artifacts and minimize reliance on real-distribution features, with potential extensions to audio and video.

Abstract

Detecting AI generated images is a challenging yet essential task. A primary difficulty arises from the detectors tendency to rely on spurious patterns, such as compression artifacts, which can influence its decisions. These issues often stem from specific patterns that the detector associates with the real data distribution, making it difficult to isolate the actual generative traces. We argue that an image should be classified as fake if and only if it contains artifacts introduced by the generative model. Based on this premise, we propose Stay Positive, an algorithm designed to constrain the detectors focus to generative artifacts while disregarding those associated with real data. Experimental results demonstrate that detectors trained with Stay Positive exhibit reduced susceptibility to spurious correlations, leading to improved generalization and robustness to post processing. Additionally, unlike detectors that associate artifacts with real images, those that focus purely on fake artifacts are better at detecting inpainted real images.

Paper Structure

This paper contains 47 sections, 4 equations, 14 figures, 8 tables, 1 algorithm.

Figures (14)

  • Figure 1: Sensitivity to WEBP Compression. Using the LSUN dataset, which contains WEBP compressed images, as part of the real distribution makes the network highly vulnerable to WEBP compression.
  • Figure 2: Image Quality-Based Spurious Features.Corvi outputs a higher real score for Flux reconstructions compared to LDM reconstructions demonstrating the spurious nature of these real features. Fake Score reduces due to the use of a different generator.
  • Figure 3: Our key idea involves 2 steps. (1) We first train a fake image detector in the standard way without any modifications. This detector focuses on both real and fake features. (2) We re-train the last layer of the network such that it only focuses on the fake features to make a decision.
  • Figure 4: Improved Robustness to WEBP Compression. Compared to the original Corvi and Rajan, our detectors Corvi$\oplus$ and Rajan$\oplus$ show increased robustness towards WEBP Compression.
  • Figure 5: Improved Robustness to Downsizing. Compared to the original Corvi, our Corvi$\oplus$ shows increased robustness towards downsampling.
  • ...and 9 more figures