Improving Interpretability and Robustness for the Detection of AI-Generated Images
Tatiana Gaintseva, Laida Kushnareva, German Magai, Irina Piontkovskaya, Sergey Nikolenko, Martin Benning, Serguei Barannikov, Gregory Slabaugh
TL;DR
The paper tackles the robustness gap in AI-generated image detection across domains and generative models. It analyzes CLIP-based detectors and introduces three robustness-enhancing strategies: residual-vector classification, embedding feature pruning, and selective Transformer-head-based detectors with interpretable text-based explanations. A new diverse dataset for AIGI detection is proposed and used to benchmark cross-generator transfer, with findings that both feature pruning and head selection can substantially improve out-of-domain performance (up to around 6–7% absolute gains) and that isotropy of embedding space improves when removing misleading dimensions. The work advances practical AIGI detection by combining interpretability with transfer robustness and provides dataset and code to spur further research.
Abstract
With growing abilities of generative models, artificial content detection becomes an increasingly important and difficult task. However, all popular approaches to this problem suffer from poor generalization across domains and generative models. In this work, we focus on the robustness of AI-generated image (AIGI) detectors. We analyze existing state-of-the-art AIGI detection methods based on frozen CLIP embeddings and show how to interpret them, shedding light on how images produced by various AI generators differ from real ones. Next we propose two ways to improve robustness: based on removing harmful components of the embedding vector and based on selecting the best performing attention heads in the image encoder model. Our methods increase the mean out-of-distribution (OOD) classification score by up to 6% for cross-model transfer. We also propose a new dataset for AIGI detection and use it in our evaluation; we believe this dataset will help boost further research. The dataset and code are provided as a supplement.
