Improving Interpretability and Robustness for the Detection of AI-Generated Images

Tatiana Gaintseva; Laida Kushnareva; German Magai; Irina Piontkovskaya; Sergey Nikolenko; Martin Benning; Serguei Barannikov; Gregory Slabaugh

Improving Interpretability and Robustness for the Detection of AI-Generated Images

Tatiana Gaintseva, Laida Kushnareva, German Magai, Irina Piontkovskaya, Sergey Nikolenko, Martin Benning, Serguei Barannikov, Gregory Slabaugh

TL;DR

The paper tackles the robustness gap in AI-generated image detection across domains and generative models. It analyzes CLIP-based detectors and introduces three robustness-enhancing strategies: residual-vector classification, embedding feature pruning, and selective Transformer-head-based detectors with interpretable text-based explanations. A new diverse dataset for AIGI detection is proposed and used to benchmark cross-generator transfer, with findings that both feature pruning and head selection can substantially improve out-of-domain performance (up to around 6–7% absolute gains) and that isotropy of embedding space improves when removing misleading dimensions. The work advances practical AIGI detection by combining interpretability with transfer robustness and provides dataset and code to spur further research.

Abstract

With growing abilities of generative models, artificial content detection becomes an increasingly important and difficult task. However, all popular approaches to this problem suffer from poor generalization across domains and generative models. In this work, we focus on the robustness of AI-generated image (AIGI) detectors. We analyze existing state-of-the-art AIGI detection methods based on frozen CLIP embeddings and show how to interpret them, shedding light on how images produced by various AI generators differ from real ones. Next we propose two ways to improve robustness: based on removing harmful components of the embedding vector and based on selecting the best performing attention heads in the image encoder model. Our methods increase the mean out-of-distribution (OOD) classification score by up to 6% for cross-model transfer. We also propose a new dataset for AIGI detection and use it in our evaluation; we believe this dataset will help boost further research. The dataset and code are provided as a supplement.

Improving Interpretability and Robustness for the Detection of AI-Generated Images

TL;DR

Abstract

Paper Structure (13 sections, 2 equations, 6 figures, 7 tables)

This paper contains 13 sections, 2 equations, 6 figures, 7 tables.

Introduction
Related Work
Data
Methods
Experimental evaluation
Limitations and broader impacts
Conclusion
Detailed experimental results on the detection of generated images
Computational resources
Full results for embeddings and residuals
Removing the components of CLIP embeddings
Accuracy plots
Removing "bad" outliers and how it influences the geometry of embeddings

Figures (6)

Figure 1: AIGI detection: (a) CLIP embedding space; (b) attention heads and feature selection.
Figure 2: Classification on CLIP embeddings: left --- original embeddings (mean accuracy: 78.33%; mean accuracy without SD-1.4.-200 and ProGAN: 77.74%); right --- embeddings where "bad" dimensions are removed (mean accuracy: 80.36%; mean accuracy without SD-1.4.-200 and ProGAN: 79.69%).
Figure 3: Classification on CLIP embeddings with fit_intercept=False: left --- original embeddings (mean accuracy: 78.31%); right --- embeddings where "bad" dimensions are removed (mean accuracy: 80.31%).
Figure 4: Classification on CLIP residuals: left --- original embeddings (mean accuracy: 75.39%); right --- embeddings where "bad" dimensions are removed (mean accuracy: 77.86%).
Figure 5: Accuracy (vertical axis) as a function of the number of components removed from the CLIP-large embedding (horizontal axis), with the method described in Section 5.1.
...and 1 more figures

Improving Interpretability and Robustness for the Detection of AI-Generated Images

TL;DR

Abstract

Improving Interpretability and Robustness for the Detection of AI-Generated Images

Authors

TL;DR

Abstract

Table of Contents

Figures (6)