A Robust Image Forensic Framework Utilizing Multi-Colorspace Enriched Vision Transformer for Distinguishing Natural and Computer-Generated Images

Manjary P. Gangan; Anoop Kadan; Lajish V L

A Robust Image Forensic Framework Utilizing Multi-Colorspace Enriched Vision Transformer for Distinguishing Natural and Computer-Generated Images

Manjary P. Gangan, Anoop Kadan, Lajish V L

TL;DR

This work proposes a robust forensic classifier framework leveraging enriched vision transformers using a fusion approach for the networks operating in RGB and YCbCr color spaces to achieve higher classification accuracy and robustness against the post-processing operations of JPEG compression and addition of Gaussian noise.

Abstract

The digital image forensics based research works in literature classifying natural and computer generated images primarily focuses on binary tasks. These tasks typically involve the classification of natural images versus computer graphics images only or natural images versus GAN generated images only, but not natural images versus both types of generated images simultaneously. Furthermore, despite the support of advanced convolutional neural networks and transformer based architectures that can achieve impressive classification accuracies for this forensic classification task of distinguishing natural and computer generated images, these models are seen to fail over the images that have undergone post-processing operations intended to deceive forensic algorithms, such as JPEG compression, Gaussian noise addition, etc. In this digital image forensic based work to distinguish between natural and computer-generated images encompassing both computer graphics and GAN generated images, we propose a robust forensic classifier framework leveraging enriched vision transformers. By employing a fusion approach for the networks operating in RGB and YCbCr color spaces, we achieve higher classification accuracy and robustness against the post-processing operations of JPEG compression and addition of Gaussian noise. Our approach outperforms baselines, demonstrating 94.25% test accuracy with significant performance gains in individual class accuracies. Visualizations of feature representations and attention maps reveal improved separability as well as improved information capture relevant to the forensic task. This work advances the state-of-the-art in image forensics by providing a generalized and resilient solution to distinguish between natural and generated images.

A Robust Image Forensic Framework Utilizing Multi-Colorspace Enriched Vision Transformer for Distinguishing Natural and Computer-Generated Images

TL;DR

Abstract

Paper Structure (15 sections, 2 equations, 8 figures, 5 tables)

This paper contains 15 sections, 2 equations, 8 figures, 5 tables.

Introduction
Related work
Our work in context
Methodology
Dataset
Motivation
Network architecture
Experimental Settings
Baselines
Results and Discussions
Robustness against Post-processing
Generalizability
Feature Visualization
Attention visualization
Conclusion

Figures (8)

Figure 1: Decline in classification accuracies of the models for the same set of images at varying levels of JPEG compression
Figure 2: Rate of decrease in accuracies due to compression differs for different classes
Figure 3: The overall architecture of the proposed model Multi-Colorspace fused and Enriched Vision Transformer (MCE-ViT)
Figure 4: Confusion matrix and DET curve of the proposed model MCE-ViT
Figure 5: Classification accuracies of the proposed model and the baselines for various JPEG compression quality factors
...and 3 more figures

A Robust Image Forensic Framework Utilizing Multi-Colorspace Enriched Vision Transformer for Distinguishing Natural and Computer-Generated Images

TL;DR

Abstract

A Robust Image Forensic Framework Utilizing Multi-Colorspace Enriched Vision Transformer for Distinguishing Natural and Computer-Generated Images

Authors

TL;DR

Abstract

Table of Contents

Figures (8)