Domain Generalized Recaptured Screen Image Identification Using SWIN Transformer
Preeti Mehta, Aman Sagar, Suchi Kumari
TL;DR
This work tackles recaptured LCD screen image detection under domain shifts and scale variation. It presents DAST-DG, a cascaded data-augmentation strategy combined with a SWIN Transformer-based domain-generalization framework, featuring a feature generator adversarially trained against a domain discriminator and a multi-stage hierarchical representation. Experiments across NTU-ROSE, ICL, and Mturk datasets demonstrate strong intra-domain performance and improved cross-domain generalization, with accuracy around 82% and precision up to 95% on high-variance data, surpassing several baselines. The approach offers practical benefits for anti-forensic tasks like insurance fraud, face spoofing, and video piracy by enabling robust detection across diverse capture conditions and displays.
Abstract
An increasing number of classification approaches have been developed to address the issue of image rebroadcast and recapturing, a standard attack strategy in insurance frauds, face spoofing, and video piracy. However, most of them neglected scale variations and domain generalization scenarios, performing poorly in instances involving domain shifts, typically made worse by inter-domain and cross-domain scale variances. To overcome these issues, we propose a cascaded data augmentation and SWIN transformer domain generalization framework (DAST-DG) in the current research work Initially, we examine the disparity in dataset representation. A feature generator is trained to make authentic images from various domains indistinguishable. This process is then applied to recaptured images, creating a dual adversarial learning setup. Extensive experiments demonstrate that our approach is practical and surpasses state-of-the-art methods across different databases. Our model achieves an accuracy of approximately 82\% with a precision of 95\% on high-variance datasets.
