Table of Contents
Fetching ...

Detecting the Undetectable: Combining Kolmogorov-Arnold Networks and MLP for AI-Generated Image Detection

Taharim Rahman Anon, Jakaria Islam Emon

TL;DR

The study tackles the rising challenge of distinguishing real from AI-generated images in the era of advanced generators like DALL-E 3, MidJourney, and Stable Diffusion 3. It couples semantic CLIP embeddings with a baseline MLP and introduces a Hybrid KAN-MLP that leverages a KANLinear module with adaptive spline-based feature transformation, achieving superior out-of-distribution robustness. A new dataset combines real RAISE images with AI images generated by multiple modern models, supplemented by a rigorous OOD test set to probe generalization. Empirically, the Hybrid KAN-MLP delivers higher F1 and AUC on three OOD pairs (Real vs. DALL-E 3, Real vs. MidJourney 5, Real vs. Firefly) than the baseline, demonstrating the value of high-resolution, adaptive feature mappings in forensic detection. The work highlights practical impact for media integrity and digital forensics, while acknowledging data-collection costs and proposing scalable, cost-efficient avenues for future deployment.

Abstract

As artificial intelligence progresses, the task of distinguishing between real and AI-generated images is increasingly complicated by sophisticated generative models. This paper presents a novel detection framework adept at robustly identifying images produced by cutting-edge generative AI models, such as DALL-E 3, MidJourney, and Stable Diffusion 3. We introduce a comprehensive dataset, tailored to include images from these advanced generators, which serves as the foundation for extensive evaluation. we propose a classification system that integrates semantic image embeddings with a traditional Multilayer Perceptron (MLP). This baseline system is designed to effectively differentiate between real and AI-generated images under various challenging conditions. Enhancing this approach, we introduce a hybrid architecture that combines Kolmogorov-Arnold Networks (KAN) with the MLP. This hybrid model leverages the adaptive, high-resolution feature transformation capabilities of KAN, enabling our system to capture and analyze complex patterns in AI-generated images that are typically overlooked by conventional models. In out-of-distribution testing, our proposed model consistently outperformed the standard MLP across three out of distribution test datasets, demonstrating superior performance and robustness in classifying real images from AI-generated images with impressive F1 scores.

Detecting the Undetectable: Combining Kolmogorov-Arnold Networks and MLP for AI-Generated Image Detection

TL;DR

The study tackles the rising challenge of distinguishing real from AI-generated images in the era of advanced generators like DALL-E 3, MidJourney, and Stable Diffusion 3. It couples semantic CLIP embeddings with a baseline MLP and introduces a Hybrid KAN-MLP that leverages a KANLinear module with adaptive spline-based feature transformation, achieving superior out-of-distribution robustness. A new dataset combines real RAISE images with AI images generated by multiple modern models, supplemented by a rigorous OOD test set to probe generalization. Empirically, the Hybrid KAN-MLP delivers higher F1 and AUC on three OOD pairs (Real vs. DALL-E 3, Real vs. MidJourney 5, Real vs. Firefly) than the baseline, demonstrating the value of high-resolution, adaptive feature mappings in forensic detection. The work highlights practical impact for media integrity and digital forensics, while acknowledging data-collection costs and proposing scalable, cost-efficient avenues for future deployment.

Abstract

As artificial intelligence progresses, the task of distinguishing between real and AI-generated images is increasingly complicated by sophisticated generative models. This paper presents a novel detection framework adept at robustly identifying images produced by cutting-edge generative AI models, such as DALL-E 3, MidJourney, and Stable Diffusion 3. We introduce a comprehensive dataset, tailored to include images from these advanced generators, which serves as the foundation for extensive evaluation. we propose a classification system that integrates semantic image embeddings with a traditional Multilayer Perceptron (MLP). This baseline system is designed to effectively differentiate between real and AI-generated images under various challenging conditions. Enhancing this approach, we introduce a hybrid architecture that combines Kolmogorov-Arnold Networks (KAN) with the MLP. This hybrid model leverages the adaptive, high-resolution feature transformation capabilities of KAN, enabling our system to capture and analyze complex patterns in AI-generated images that are typically overlooked by conventional models. In out-of-distribution testing, our proposed model consistently outperformed the standard MLP across three out of distribution test datasets, demonstrating superior performance and robustness in classifying real images from AI-generated images with impressive F1 scores.
Paper Structure (12 sections, 12 equations, 5 figures, 4 tables)

This paper contains 12 sections, 12 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Illustrates of data generation process in our study. Starting with real images, descriptions are generated using a language model. These descriptions are then used to generate AI images using state-of-the-art models such as Stable Diffusion 3 Ultra (SD 3 Ultra), DALL-E 3, and MidJourney 6.
  • Figure 2: Generated images from given real image descriptions using generative AI models.
  • Figure 3: Proposed Methodology for AI-Generated Image Detection Framework.
  • Figure 4: Comparison of confusion matrices for the proposed hybrid classification approach and the baseline approach across three datasets. Subfigures (a), (b), and (c) depict the performance of the proposed approach on the Real vs. DALL-E 3, Real vs. MidJourney, and Real vs. Adobe Firefly datasets, respectively. Subfigures (d), (e), and (f) show the performance of the baseline MLP approach on the same datasets. These matrices illustrate the effectiveness of each approach in distinguishing real images from AI-generated images, with each matrix providing insights into the true positive, false positive, true negative, and false negative rates achieved.
  • Figure 5: ROC curves illustrating the performance of the proposed hybrid classification approach on three datasets: (a) Real vs. DALL-E 3, (b) Real vs. MidJourney, and (c) Real vs. MidJourney. These curves demonstrate the classifier's discriminative ability by showing the trade-off between true positive rate (sensitivity) and false positive rate (1-specificity) across different thresholds, highlighting the model's effectiveness in accurately distinguishing between real and AI-generated images. The area under the curve (AUC) provides a quantitative measure of the overall performance of the classifier across these varied testing scenarios.