Redefining Generalization in Visual Domains: A Two-Axis Framework for Fake Image Detection with FusionDetect
Amirtaha Amanzadi, Zahra Dehghanian, Hamid Beigy, Hamid R. Rabiee
TL;DR
This work defines a two-axis generalization framework for AI-generated image detection, arguing that detectors must generalize across unseen generators and across unseen visual domains. It introduces FusionDetect, a fusion-based detector that combines semantic features from CLIP with structural features from Dinov2 using frozen backbones and a lightweight MLP head, trained with binary cross-entropy. To evaluate universal performance, the OmniGen Benchmark is proposed, comprising 12 modern generators and a high-semantic-variance test set; FusionDetect achieves state-of-the-art results on both established benchmarks and OmniGen, with strong robustness to common perturbations. The study highlights the value of combining complementary foundational representations and provides a practical benchmark and framework to guide future universal fake-media detectors with real-world relevance.
Abstract
The rapid development of generative models has made it increasingly crucial to develop detectors that can reliably detect synthetic images. Although most of the work has now focused on cross-generator generalization, we argue that this viewpoint is too limited. Detecting synthetic images involves another equally important challenge: generalization across visual domains. To bridge this gap,we present the OmniGen Benchmark. This comprehensive evaluation dataset incorporates 12 state-of-the-art generators, providing a more realistic way of evaluating detector performance under realistic conditions. In addition, we introduce a new method, FusionDetect, aimed at addressing both vectors of generalization. FusionDetect draws on the benefits of two frozen foundation models: CLIP & Dinov2. By deriving features from both complementary models,we develop a cohesive feature space that naturally adapts to changes in both thecontent and design of the generator. Our extensive experiments demonstrate that FusionDetect delivers not only a new state-of-the-art, which is 3.87% more accurate than its closest competitor and 6.13% more precise on average on established benchmarks, but also achieves a 4.48% increase in accuracy on OmniGen,along with exceptional robustness to common image perturbations. We introduce not only a top-performing detector, but also a new benchmark and framework for furthering universal AI image detection. The code and dataset are available at http://github.com/amir-aman/FusionDetect
