Table of Contents
Fetching ...

Redefining Generalization in Visual Domains: A Two-Axis Framework for Fake Image Detection with FusionDetect

Amirtaha Amanzadi, Zahra Dehghanian, Hamid Beigy, Hamid R. Rabiee

TL;DR

This work defines a two-axis generalization framework for AI-generated image detection, arguing that detectors must generalize across unseen generators and across unseen visual domains. It introduces FusionDetect, a fusion-based detector that combines semantic features from CLIP with structural features from Dinov2 using frozen backbones and a lightweight MLP head, trained with binary cross-entropy. To evaluate universal performance, the OmniGen Benchmark is proposed, comprising 12 modern generators and a high-semantic-variance test set; FusionDetect achieves state-of-the-art results on both established benchmarks and OmniGen, with strong robustness to common perturbations. The study highlights the value of combining complementary foundational representations and provides a practical benchmark and framework to guide future universal fake-media detectors with real-world relevance.

Abstract

The rapid development of generative models has made it increasingly crucial to develop detectors that can reliably detect synthetic images. Although most of the work has now focused on cross-generator generalization, we argue that this viewpoint is too limited. Detecting synthetic images involves another equally important challenge: generalization across visual domains. To bridge this gap,we present the OmniGen Benchmark. This comprehensive evaluation dataset incorporates 12 state-of-the-art generators, providing a more realistic way of evaluating detector performance under realistic conditions. In addition, we introduce a new method, FusionDetect, aimed at addressing both vectors of generalization. FusionDetect draws on the benefits of two frozen foundation models: CLIP & Dinov2. By deriving features from both complementary models,we develop a cohesive feature space that naturally adapts to changes in both thecontent and design of the generator. Our extensive experiments demonstrate that FusionDetect delivers not only a new state-of-the-art, which is 3.87% more accurate than its closest competitor and 6.13% more precise on average on established benchmarks, but also achieves a 4.48% increase in accuracy on OmniGen,along with exceptional robustness to common image perturbations. We introduce not only a top-performing detector, but also a new benchmark and framework for furthering universal AI image detection. The code and dataset are available at http://github.com/amir-aman/FusionDetect

Redefining Generalization in Visual Domains: A Two-Axis Framework for Fake Image Detection with FusionDetect

TL;DR

This work defines a two-axis generalization framework for AI-generated image detection, arguing that detectors must generalize across unseen generators and across unseen visual domains. It introduces FusionDetect, a fusion-based detector that combines semantic features from CLIP with structural features from Dinov2 using frozen backbones and a lightweight MLP head, trained with binary cross-entropy. To evaluate universal performance, the OmniGen Benchmark is proposed, comprising 12 modern generators and a high-semantic-variance test set; FusionDetect achieves state-of-the-art results on both established benchmarks and OmniGen, with strong robustness to common perturbations. The study highlights the value of combining complementary foundational representations and provides a practical benchmark and framework to guide future universal fake-media detectors with real-world relevance.

Abstract

The rapid development of generative models has made it increasingly crucial to develop detectors that can reliably detect synthetic images. Although most of the work has now focused on cross-generator generalization, we argue that this viewpoint is too limited. Detecting synthetic images involves another equally important challenge: generalization across visual domains. To bridge this gap,we present the OmniGen Benchmark. This comprehensive evaluation dataset incorporates 12 state-of-the-art generators, providing a more realistic way of evaluating detector performance under realistic conditions. In addition, we introduce a new method, FusionDetect, aimed at addressing both vectors of generalization. FusionDetect draws on the benefits of two frozen foundation models: CLIP & Dinov2. By deriving features from both complementary models,we develop a cohesive feature space that naturally adapts to changes in both thecontent and design of the generator. Our extensive experiments demonstrate that FusionDetect delivers not only a new state-of-the-art, which is 3.87% more accurate than its closest competitor and 6.13% more precise on average on established benchmarks, but also achieves a 4.48% increase in accuracy on OmniGen,along with exceptional robustness to common image perturbations. We introduce not only a top-performing detector, but also a new benchmark and framework for furthering universal AI image detection. The code and dataset are available at http://github.com/amir-aman/FusionDetect

Paper Structure

This paper contains 18 sections, 2 equations, 4 figures, 11 tables.

Figures (4)

  • Figure 1: FusionDetect performance on OmniGen and established benchmarks from previous works genimageimaginetaide compared to other detectors. The size of the bubble indicates the standard deviation of accuracy between all generators in the dataset (smaller is better).
  • Figure 2: T-SNE tsne projection of GenImage genimage, ImagiNet imaginet, and Chameleon aide dataset.
  • Figure 3: T-SNE tsne projection of GenImage genimage, ImagiNet imaginet, and Chameleon aide dataset. The CLIP+DINO (bottom) encoder successfully separates real and fake classes for each dataset unlike the other two options. (Top left: CLIP, Top right: DINOv2)
  • Figure 4: OmniGen Benchmark Images. Top row: Midjourney v7 midjourney, HiDream hidream, Imagine 4 imagen, Kandinsky 3 kandinsky; Middle row: Flux 1 flux, Dreamshaper dreamshaper, Pixart-$\delta$pixart, Cogview 4 cogview; Bottom row: Juggernaut juggernaut, SD3.5 sd3.5, Imagen 4 ultra imagen, GPT4o gpt4o.