Table of Contents
Fetching ...

Enhancing Image Authenticity Detection: Swin Transformers and Color Frame Analysis for CGI vs. Real Images

Preeti Mehta, Aman Sagar, Suchi Kumari

TL;DR

This paper tackles the challenge of distinguishing highly realistic CGI from authentic digital camera images by employing a Swin Transformer classifier augmented with color-frame analysis in RGB and YCbCr (CbCrY) spaces. By leveraging the Swin Transformer's hierarchical self-attention and incorporating color-frame preprocessing, the approach achieves high accuracy and robustness against common manipulations, as demonstrated on the CIFAKE-10 dataset where RGB inputs reach 98% accuracy. A t-SNE visualization corroborates improved feature separability when color-frame features are used, and the method consistently outperforms several CNN baselines. The work highlights the practical potential of transformer-based, color-aware representations for rapid, robust image authenticity detection, while outlining avenues for domain generalization, platform-specific CGI detection, and ethical considerations in future research.

Abstract

The rapid advancements in computer graphics have greatly enhanced the quality of computer-generated images (CGI), making them increasingly indistinguishable from authentic images captured by digital cameras (ADI). This indistinguishability poses significant challenges, especially in an era of widespread misinformation and digitally fabricated content. This research proposes a novel approach to classify CGI and ADI using Swin Transformers and preprocessing techniques involving RGB and CbCrY color frame analysis. By harnessing the capabilities of Swin Transformers, our method foregoes handcrafted features instead of relying on raw pixel data for model training. This approach achieves state-of-the-art accuracy while offering substantial improvements in processing speed and robustness against joint image manipulations such as noise addition, blurring, and JPEG compression. Our findings highlight the potential of Swin Transformers combined with advanced color frame analysis for effective and efficient image authenticity detection.

Enhancing Image Authenticity Detection: Swin Transformers and Color Frame Analysis for CGI vs. Real Images

TL;DR

This paper tackles the challenge of distinguishing highly realistic CGI from authentic digital camera images by employing a Swin Transformer classifier augmented with color-frame analysis in RGB and YCbCr (CbCrY) spaces. By leveraging the Swin Transformer's hierarchical self-attention and incorporating color-frame preprocessing, the approach achieves high accuracy and robustness against common manipulations, as demonstrated on the CIFAKE-10 dataset where RGB inputs reach 98% accuracy. A t-SNE visualization corroborates improved feature separability when color-frame features are used, and the method consistently outperforms several CNN baselines. The work highlights the practical potential of transformer-based, color-aware representations for rapid, robust image authenticity detection, while outlining avenues for domain generalization, platform-specific CGI detection, and ethical considerations in future research.

Abstract

The rapid advancements in computer graphics have greatly enhanced the quality of computer-generated images (CGI), making them increasingly indistinguishable from authentic images captured by digital cameras (ADI). This indistinguishability poses significant challenges, especially in an era of widespread misinformation and digitally fabricated content. This research proposes a novel approach to classify CGI and ADI using Swin Transformers and preprocessing techniques involving RGB and CbCrY color frame analysis. By harnessing the capabilities of Swin Transformers, our method foregoes handcrafted features instead of relying on raw pixel data for model training. This approach achieves state-of-the-art accuracy while offering substantial improvements in processing speed and robustness against joint image manipulations such as noise addition, blurring, and JPEG compression. Our findings highlight the potential of Swin Transformers combined with advanced color frame analysis for effective and efficient image authenticity detection.
Paper Structure (8 sections, 6 equations, 5 figures, 3 tables)

This paper contains 8 sections, 6 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Example of few Computer Generated (CG) images from dataset CIFAKE-10. The images shows the difficulty in distingushing between two classes of images with naked eye.
  • Figure 2: Architecture of the proposed Swin transformer
  • Figure 3: The plot \ref{['fig3a']} illustrate the t-SNE plot of the extracted features from the Swin Tranformer for the RGB input images. The plot \ref{['fig3b']} illustrate the t-SNE plot of the extracted features from the Swin Tranformer for the CbCrY input images.
  • Figure 4: The plot \ref{['fig4a']} and \ref{['fig4b']} illustrate the training and validation accuracy and loss values, respectively from the proposed model for the RGB input images for five epochs. The plot \ref{['fig4c']} and \ref{['fig4d']} illustrate the training and validation results for the CbCrY input images for five epochs.
  • Figure 5: The ROC plots \ref{['fig5a']} and \ref{['fig5b']} for the RGB and CbCrY format input images for the proposed model using Swin Transformer.