Table of Contents
Fetching ...

Swin Transformer for Robust Differentiation of Real and Synthetic Images: Intra- and Inter-Dataset Analysis

Preetu Mehta, Aman Sagar, Suchi Kumari

TL;DR

The Swin Transformer model's strong performance across multiple datasets indicates its capability for domain generalization, making it a valuable asset in scenarios requiring precise and reliable image classification.

Abstract

\textbf{Purpose} This study aims to address the growing challenge of distinguishing computer-generated imagery (CGI) from authentic digital images in the RGB color space. Given the limitations of existing classification methods in handling the complexity and variability of CGI, this research proposes a Swin Transformer-based model for accurate differentiation between natural and synthetic images. \textbf{Methods} The proposed model leverages the Swin Transformer's hierarchical architecture to capture local and global features crucial for distinguishing CGI from natural images. The model's performance was evaluated through intra-dataset and inter-dataset testing across three distinct datasets: CiFAKE, JSSSTU, and Columbia. The datasets were tested individually (D1, D2, D3) and in combination (D1+D2+D3) to assess the model's robustness and domain generalization capabilities. \textbf{Results} The Swin Transformer-based model demonstrated high accuracy, consistently achieving a range of 97-99\% across all datasets and testing scenarios. These results confirm the model's effectiveness in detecting CGI, showcasing its robustness and reliability in both intra-dataset and inter-dataset evaluations. \textbf{Conclusion} The findings of this study highlight the Swin Transformer model's potential as an advanced tool for digital image forensics, particularly in distinguishing CGI from natural images. The model's strong performance across multiple datasets indicates its capability for domain generalization, making it a valuable asset in scenarios requiring precise and reliable image classification.

Swin Transformer for Robust Differentiation of Real and Synthetic Images: Intra- and Inter-Dataset Analysis

TL;DR

The Swin Transformer model's strong performance across multiple datasets indicates its capability for domain generalization, making it a valuable asset in scenarios requiring precise and reliable image classification.

Abstract

\textbf{Purpose} This study aims to address the growing challenge of distinguishing computer-generated imagery (CGI) from authentic digital images in the RGB color space. Given the limitations of existing classification methods in handling the complexity and variability of CGI, this research proposes a Swin Transformer-based model for accurate differentiation between natural and synthetic images. \textbf{Methods} The proposed model leverages the Swin Transformer's hierarchical architecture to capture local and global features crucial for distinguishing CGI from natural images. The model's performance was evaluated through intra-dataset and inter-dataset testing across three distinct datasets: CiFAKE, JSSSTU, and Columbia. The datasets were tested individually (D1, D2, D3) and in combination (D1+D2+D3) to assess the model's robustness and domain generalization capabilities. \textbf{Results} The Swin Transformer-based model demonstrated high accuracy, consistently achieving a range of 97-99\% across all datasets and testing scenarios. These results confirm the model's effectiveness in detecting CGI, showcasing its robustness and reliability in both intra-dataset and inter-dataset evaluations. \textbf{Conclusion} The findings of this study highlight the Swin Transformer model's potential as an advanced tool for digital image forensics, particularly in distinguishing CGI from natural images. The model's strong performance across multiple datasets indicates its capability for domain generalization, making it a valuable asset in scenarios requiring precise and reliable image classification.
Paper Structure (13 sections, 5 equations, 4 figures, 3 tables)

This paper contains 13 sections, 5 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Examples of Computer Generated (CG) images from the CIFAKE-10, Columbia RCGI, and JSSSTU datasets (from top to bottom row, respectively). These images illustrate the challenge of distinguishing between CG images and natural images with the naked eye. The variation is also highlighted in computer-generated images across different datasets.
  • Figure 2: Architecture of the proposed Swin transformer and the expansion of Swin Transformer Block under it.
  • Figure 3: The plots illustrate the t-SNE plot of the extracted features from the Swin Tranformer for the CiFake, columbia PRCG and real images, JSSSTU and combined all three dataset images (from left to right, top to bottom).
  • Figure 4: The training and validation accuracy plot, loss plot and ROC (from left to right) for the datasets CiFAKE, columbia, JSSSTU, and combined three datasets (top to bottom).