Table of Contents
Fetching ...

OmniAID: Decoupling Semantic and Artifacts for Universal AI-Generated Image Detection in the Wild

Yuncheng Guo, Junyan Ye, Chenjue Zhang, Hengrui Kang, Haohuan Fu, Conghui He, Weijia Li

TL;DR

OmniAID introduces a decoupled Mixture-of-Experts detector that separates content-specific semantic flaws from universal artifacts to achieve universal AIGI detection in the wild. It combines a set of Routable Specialized Semantic Experts with a Fixed Universal Artifact Expert, optimized via a two-stage training regime, and validates robustness with Mirage, a modern large-scale dataset. Across GenImage, Chameleon, Mirage-Test, and DRCT-2M benchmarks, OmniAID achieves state-of-the-art generalization, demonstrating resilience to distribution shifts and post-processing perturbations. This decoupled paradigm and Mirage data foundation offer a practical, scalable path toward robust, real-world AIGI authentication.”

Abstract

A truly universal AI-Generated Image (AIGI) detector must simultaneously generalize across diverse generative models and varied semantic content. Current state-of-the-art methods learn a single, entangled forgery representation, conflating content-dependent flaws with content-agnostic artifacts, and are further constrained by outdated benchmarks. To overcome these limitations, we propose OmniAID, a novel framework centered on a decoupled Mixture-of-Experts (MoE) architecture. The core of our method is a hybrid expert system designed to decouple: (1) semantic flaws across distinct content domains, and (2) content-dependent flaws from content-agnostic universal artifacts. This system employs a set of Routable Specialized Semantic Experts, each for a distinct domain (e.g., human, animal), complemented by a Fixed Universal Artifact Expert. This architecture is trained using a novel two-stage strategy: we first train the experts independently with domain-specific hard-sampling to ensure specialization, and subsequently train a lightweight gating network for effective input routing. By explicitly decoupling "what is generated" (content-specific flaws) from "how it is generated" (universal artifacts), OmniAID achieves robust generalization. To address outdated benchmarks and validate real-world applicability, we introduce Mirage, a new large-scale, contemporary dataset. Extensive experiments, using both traditional benchmarks and our Mirage dataset, demonstrate our model surpasses existing monolithic detectors, establishing a new and robust standard for AIGI authentication against modern, in-the-wild threats.

OmniAID: Decoupling Semantic and Artifacts for Universal AI-Generated Image Detection in the Wild

TL;DR

OmniAID introduces a decoupled Mixture-of-Experts detector that separates content-specific semantic flaws from universal artifacts to achieve universal AIGI detection in the wild. It combines a set of Routable Specialized Semantic Experts with a Fixed Universal Artifact Expert, optimized via a two-stage training regime, and validates robustness with Mirage, a modern large-scale dataset. Across GenImage, Chameleon, Mirage-Test, and DRCT-2M benchmarks, OmniAID achieves state-of-the-art generalization, demonstrating resilience to distribution shifts and post-processing perturbations. This decoupled paradigm and Mirage data foundation offer a practical, scalable path toward robust, real-world AIGI authentication.”

Abstract

A truly universal AI-Generated Image (AIGI) detector must simultaneously generalize across diverse generative models and varied semantic content. Current state-of-the-art methods learn a single, entangled forgery representation, conflating content-dependent flaws with content-agnostic artifacts, and are further constrained by outdated benchmarks. To overcome these limitations, we propose OmniAID, a novel framework centered on a decoupled Mixture-of-Experts (MoE) architecture. The core of our method is a hybrid expert system designed to decouple: (1) semantic flaws across distinct content domains, and (2) content-dependent flaws from content-agnostic universal artifacts. This system employs a set of Routable Specialized Semantic Experts, each for a distinct domain (e.g., human, animal), complemented by a Fixed Universal Artifact Expert. This architecture is trained using a novel two-stage strategy: we first train the experts independently with domain-specific hard-sampling to ensure specialization, and subsequently train a lightweight gating network for effective input routing. By explicitly decoupling "what is generated" (content-specific flaws) from "how it is generated" (universal artifacts), OmniAID achieves robust generalization. To address outdated benchmarks and validate real-world applicability, we introduce Mirage, a new large-scale, contemporary dataset. Extensive experiments, using both traditional benchmarks and our Mirage dataset, demonstrate our model surpasses existing monolithic detectors, establishing a new and robust standard for AIGI authentication against modern, in-the-wild threats.

Paper Structure

This paper contains 44 sections, 6 equations, 10 figures, 19 tables.

Figures (10)

  • Figure 1: (a) Previous methods suffer from a monolithic, entangled representation, merging semantic flaws and universal artifacts, thereby restricting universality. (b) Our OmniAID solves this via decoupling: an input Router routes the image, specialized Semantic Detectors handle high-level flaws, and an Artifact Detector handles low-level features. The parameters from these active detectors are then aggregated into a final Aggregation Detector, which makes the robust, disentangled decision.
  • Figure 2: Semantic Generalization Gaps and Benchmark Limitations. (a)-(b) reveal poor cross-domain generalization, especially for the Anime, Human, and Animal domains. (c) highlights the severe performance collapse of GenImage SDv1.4 genimage -trained models on the real-world Chameleon aide dataset, underscoring profound benchmark limitations against in-the-wild distributional shift.
  • Figure 3: Architectural overview of the proposed OmniAID framework. The model employs a two-stage training strategy. Stage 1 (a): Expert Specialization, where domain-specific semantic experts (e.g., Human, Anime) and a universal Artifact Expert, both implemented as residual matrices after SVD decomposition, are trained independently using hard-sampling data. Stage 2 (b): Router Training, where a lightweight router is trained, and the system integrates the weights from various experts into a final weight.
  • Figure 4: SVD-based Weight Decomposition for Orthogonal MoE Adaptation.
  • Figure 5: Performance (Accuracy %) comparison on the in-the-wild Chameleon benchmark. To ensure a fair comparison, all models trained on GenImage-SD v1.4, except OmniAID-Mirage (on Mirage-Train).
  • ...and 5 more figures