Table of Contents
Fetching ...

Multi-Feature Fusion Approach for Generative AI Images Detection

Abderrezzaq Sendjasni, Mohamed-Chaker Larabi

Abstract

The rapid evolution of Generative AI (GenAI) models has led to synthetic images of unprecedented realism, challenging traditional methods for distinguishing them from natural photographs. While existing detectors often rely on single-feature spaces, such as statistical regularities, semantic embeddings, or texture patterns, these approaches tend to lack robustness when confronted with diverse and evolving generative models. In this work, we investigate and systematically evaluate a multi-feature fusion framework that combines complementary cues from three distinct spaces: (1) Mean Subtracted Contrast Normalized (MSCN) features capturing low-level statistical deviations; (2) CLIP embeddings encoding high-level semantic coherence; and (3) Multi-scale Local Binary Patterns (MLBP) characterizing mid-level texture anomalies. Through extensive experiments on four benchmark datasets covering a wide range of generative models, we show that individual feature spaces exhibit significant performance variability across different generators. Crucially, the fusion of all three representations yields superior and more consistent performance, particularly in a challenging mixed-model scenario. Compared to state-of-the-art methods, the proposed framework yields consistently improved performance across all evaluated datasets. Overall, this work highlights the importance of hybrid representations for robust GenAI image detection and provides a principled framework for integrating complementary visual cues.

Multi-Feature Fusion Approach for Generative AI Images Detection

Abstract

The rapid evolution of Generative AI (GenAI) models has led to synthetic images of unprecedented realism, challenging traditional methods for distinguishing them from natural photographs. While existing detectors often rely on single-feature spaces, such as statistical regularities, semantic embeddings, or texture patterns, these approaches tend to lack robustness when confronted with diverse and evolving generative models. In this work, we investigate and systematically evaluate a multi-feature fusion framework that combines complementary cues from three distinct spaces: (1) Mean Subtracted Contrast Normalized (MSCN) features capturing low-level statistical deviations; (2) CLIP embeddings encoding high-level semantic coherence; and (3) Multi-scale Local Binary Patterns (MLBP) characterizing mid-level texture anomalies. Through extensive experiments on four benchmark datasets covering a wide range of generative models, we show that individual feature spaces exhibit significant performance variability across different generators. Crucially, the fusion of all three representations yields superior and more consistent performance, particularly in a challenging mixed-model scenario. Compared to state-of-the-art methods, the proposed framework yields consistently improved performance across all evaluated datasets. Overall, this work highlights the importance of hybrid representations for robust GenAI image detection and provides a principled framework for integrating complementary visual cues.

Paper Structure

This paper contains 22 sections, 9 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Overview of the proposed multi-feature fusion pipeline. An input image is processed by three parallel feature encoders ($\Psi_{\mathrm{MSCN}}$, $\Psi_{\mathrm{CLIP}}$, $\Psi_{\mathrm{MLBP}}$). The resulting feature vectors are normalized, fused, and fed into a classifier for final prediction.
  • Figure 2: Visual comparison of texture (LBP) and contrast (GLCM) maps for a natural image and its AI-generated counterparts by DALLE and Stable Diffusion. The MSCN Contrast Maps (right) reveal that AI-generated images often exhibit inconsistent local intensity distributions and 'halo' artifacts compared to the smooth, natural scene statistics of the photograph. Simultaneously, the MLBP texture maps (center) expose the underlying structural regularity and 'tiling' artifacts inherent in generative architectures, contrasting sharply with the stochastic, high-variance micro-textures observed in the natural sample.
  • Figure 3: Real--GenAI distribution separation across datasets and feature configurations. For each dataset, we report the (log-scaled) Gaussian Fréchet distance (FID-like) between the natural and GenAI feature distributions for the seven configurations (three individual feature families and their fusions). Larger values indicate greater deviation from the natural image manifold in the corresponding feature projection, while smaller values indicate stronger manifold overlap and a harder detection regime.
  • Figure 4: Association between manifold separation and detection performance across feature configurations. For each dataset (Synthbuster, PKU, CIFAKE, FakeBench), points correspond to the seven feature configurations (marker-coded) and plot average MCC (y-axis) versus log10 Fréchet divergence between natural and GenAI feature distributions (x-axis). The title of each panel reports the absolute Spearman rank correlation (SRCC $|\rho|$) computed across configurations.
  • Figure 5: t-SNE on CIFAKE (natural vs. GenAI) across all feature configurations
  • ...and 4 more figures