WildFake: A Large-scale Challenging Dataset for AI-Generated Images Detection
Yan Hong, Jianfu Zhang
TL;DR
WildFake tackles the challenge of detecting AI-generated images across diverse generators and real-world content. It introduces a large-scale, highly hierarchical dataset spanning GANs, diffusion models, and other generators, organized into five levels of cross--generational analysis. Through extensive experiments, detectors trained on WildFake show superior cross-dataset generalization and robustness to degradations compared with baselines, with ViT-based detectors particularly resilient. The dataset enables practical assessment of detector performance in real-world scenarios and offers insights into modern generative model capabilities.
Abstract
The extraordinary ability of generative models enabled the generation of images with such high quality that human beings cannot distinguish Artificial Intelligence (AI) generated images from real-life photographs. The development of generation techniques opened up new opportunities but concurrently introduced potential risks to privacy, authenticity, and security. Therefore, the task of detecting AI-generated imagery is of paramount importance to prevent illegal activities. To assess the generalizability and robustness of AI-generated image detection, we present a large-scale dataset, referred to as WildFake, comprising state-of-the-art generators, diverse object categories, and real-world applications. WildFake dataset has the following advantages: 1) Rich Content with Wild collection: WildFake collects fake images from the open-source community, enriching its diversity with a broad range of image classes and image styles. 2) Hierarchical structure: WildFake contains fake images synthesized by different types of generators from GANs, diffusion models, to other generative models. These key strengths enhance the generalization and robustness of detectors trained on WildFake, thereby demonstrating WildFake's considerable relevance and effectiveness for AI-generated detectors in real-world scenarios. Moreover, our extensive evaluation experiments are tailored to yield profound insights into the capabilities of different levels of generative models, a distinctive advantage afforded by WildFake's unique hierarchical structure.
