GenImage: A Million-Scale Benchmark for Detecting AI-Generated Image
Mingjian Zhu, Hanting Chen, Qiangyu Yan, Xudong Huang, Guanyu Lin, Wei Li, Zhijun Tu, Hailin Hu, Jie Hu, Yunhe Wang
TL;DR
The paper tackles the urgent need for robust detectors to distinguish AI-generated images from real ones amid rapidly improving diffusion and GAN models. It introduces GenImage, a million-scale benchmark aligned to ImageNet, combining 1,281,167 real images with 1,350,000 fake images generated by eight state-of-the-art generators for 1000 classes. Two practical tasks—Cross-Generator Image Classification and Degraded Image Classification—are proposed, with extensive analyses on data scale, frequency-domain artifacts, generator correlations, and content generalization. The findings reveal strong within-generator detection but challenging cross-generator generalization, especially for diffusion-based images, underscoring GenImage’s value for developing generator-agnostic detectors with real-world applicability.
Abstract
The extraordinary ability of generative models to generate photographic images has intensified concerns about the spread of disinformation, thereby leading to the demand for detectors capable of distinguishing between AI-generated fake images and real images. However, the lack of large datasets containing images from the most advanced image generators poses an obstacle to the development of such detectors. In this paper, we introduce the GenImage dataset, which has the following advantages: 1) Plenty of Images, including over one million pairs of AI-generated fake images and collected real images. 2) Rich Image Content, encompassing a broad range of image classes. 3) State-of-the-art Generators, synthesizing images with advanced diffusion models and GANs. The aforementioned advantages allow the detectors trained on GenImage to undergo a thorough evaluation and demonstrate strong applicability to diverse images. We conduct a comprehensive analysis of the dataset and propose two tasks for evaluating the detection method in resembling real-world scenarios. The cross-generator image classification task measures the performance of a detector trained on one generator when tested on the others. The degraded image classification task assesses the capability of the detectors in handling degraded images such as low-resolution, blurred, and compressed images. With the GenImage dataset, researchers can effectively expedite the development and evaluation of superior AI-generated image detectors in comparison to prevailing methodologies.
