Table of Contents
Fetching ...

Improving Interpretability and Robustness for the Detection of AI-Generated Images

Tatiana Gaintseva, Laida Kushnareva, German Magai, Irina Piontkovskaya, Sergey Nikolenko, Martin Benning, Serguei Barannikov, Gregory Slabaugh

TL;DR

The paper tackles the robustness gap in AI-generated image detection across domains and generative models. It analyzes CLIP-based detectors and introduces three robustness-enhancing strategies: residual-vector classification, embedding feature pruning, and selective Transformer-head-based detectors with interpretable text-based explanations. A new diverse dataset for AIGI detection is proposed and used to benchmark cross-generator transfer, with findings that both feature pruning and head selection can substantially improve out-of-domain performance (up to around 6–7% absolute gains) and that isotropy of embedding space improves when removing misleading dimensions. The work advances practical AIGI detection by combining interpretability with transfer robustness and provides dataset and code to spur further research.

Abstract

With growing abilities of generative models, artificial content detection becomes an increasingly important and difficult task. However, all popular approaches to this problem suffer from poor generalization across domains and generative models. In this work, we focus on the robustness of AI-generated image (AIGI) detectors. We analyze existing state-of-the-art AIGI detection methods based on frozen CLIP embeddings and show how to interpret them, shedding light on how images produced by various AI generators differ from real ones. Next we propose two ways to improve robustness: based on removing harmful components of the embedding vector and based on selecting the best performing attention heads in the image encoder model. Our methods increase the mean out-of-distribution (OOD) classification score by up to 6% for cross-model transfer. We also propose a new dataset for AIGI detection and use it in our evaluation; we believe this dataset will help boost further research. The dataset and code are provided as a supplement.

Improving Interpretability and Robustness for the Detection of AI-Generated Images

TL;DR

The paper tackles the robustness gap in AI-generated image detection across domains and generative models. It analyzes CLIP-based detectors and introduces three robustness-enhancing strategies: residual-vector classification, embedding feature pruning, and selective Transformer-head-based detectors with interpretable text-based explanations. A new diverse dataset for AIGI detection is proposed and used to benchmark cross-generator transfer, with findings that both feature pruning and head selection can substantially improve out-of-domain performance (up to around 6–7% absolute gains) and that isotropy of embedding space improves when removing misleading dimensions. The work advances practical AIGI detection by combining interpretability with transfer robustness and provides dataset and code to spur further research.

Abstract

With growing abilities of generative models, artificial content detection becomes an increasingly important and difficult task. However, all popular approaches to this problem suffer from poor generalization across domains and generative models. In this work, we focus on the robustness of AI-generated image (AIGI) detectors. We analyze existing state-of-the-art AIGI detection methods based on frozen CLIP embeddings and show how to interpret them, shedding light on how images produced by various AI generators differ from real ones. Next we propose two ways to improve robustness: based on removing harmful components of the embedding vector and based on selecting the best performing attention heads in the image encoder model. Our methods increase the mean out-of-distribution (OOD) classification score by up to 6% for cross-model transfer. We also propose a new dataset for AIGI detection and use it in our evaluation; we believe this dataset will help boost further research. The dataset and code are provided as a supplement.
Paper Structure (13 sections, 2 equations, 6 figures, 7 tables)

This paper contains 13 sections, 2 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: AIGI detection: (a) CLIP embedding space; (b) attention heads and feature selection.
  • Figure 2: Classification on CLIP embeddings: left --- original embeddings (mean accuracy: 78.33%; mean accuracy without SD-1.4.-200 and ProGAN: 77.74%); right --- embeddings where "bad" dimensions are removed (mean accuracy: 80.36%; mean accuracy without SD-1.4.-200 and ProGAN: 79.69%).
  • Figure 3: Classification on CLIP embeddings with fit_intercept=False: left --- original embeddings (mean accuracy: 78.31%); right --- embeddings where "bad" dimensions are removed (mean accuracy: 80.31%).
  • Figure 4: Classification on CLIP residuals: left --- original embeddings (mean accuracy: 75.39%); right --- embeddings where "bad" dimensions are removed (mean accuracy: 77.86%).
  • Figure 5: Accuracy (vertical axis) as a function of the number of components removed from the CLIP-large embedding (horizontal axis), with the method described in Section 5.1.
  • ...and 1 more figures