Exploring Adversarial Watermarking in Transformer-Based Models: Transferability and Robustness Against Defense Mechanism for Medical Images

Rifat Sadik; Tanvir Rahman; Arpan Bhattacharjee; Bikash Chandra Halder; Ismail Hossain; Rifat Sarker Aoyon; Md. Golam Rabiul Alam; Jia Uddin

Exploring Adversarial Watermarking in Transformer-Based Models: Transferability and Robustness Against Defense Mechanism for Medical Images

Rifat Sadik, Tanvir Rahman, Arpan Bhattacharjee, Bikash Chandra Halder, Ismail Hossain, Rifat Sarker Aoyon, Md. Golam Rabiul Alam, Jia Uddin

TL;DR

This study investigates adversarial watermarking in Vision Transformer (ViT) models applied to medical skin imaging, focusing on transferability to CNNs and defense via adversarial training. Adversarial perturbations are generated with Projected Gradient Descent (PGD) and evaluated across ViT, ResNet-50, and VGG16. The results show ViTs suffer a severe accuracy drop under attack (as low as 27.6%), while adversarial training restores robustness up to 90.0% across architectures. The work highlights security implications for dermatology AI and points to future work on dynamic training strategies and larger, more diverse datasets to ensure resilience in clinical settings.

Abstract

Deep learning models have shown remarkable success in dermatological image analysis, offering potential for automated skin disease diagnosis. Previously, convolutional neural network(CNN) based architectures have achieved immense popularity and success in computer vision (CV) based task like skin image recognition, generation and video analysis. But with the emergence of transformer based models, CV tasks are now are nowadays carrying out using these models. Vision Transformers (ViTs) is such a transformer-based models that have shown success in computer vision. It uses self-attention mechanisms to achieve state-of-the-art performance across various tasks. However, their reliance on global attention mechanisms makes them susceptible to adversarial perturbations. This paper aims to investigate the susceptibility of ViTs for medical images to adversarial watermarking-a method that adds so-called imperceptible perturbations in order to fool models. By generating adversarial watermarks through Projected Gradient Descent (PGD), we examine the transferability of such attacks to CNNs and analyze the performance defense mechanism -- adversarial training. Results indicate that while performance is not compromised for clean images, ViTs certainly become much more vulnerable to adversarial attacks: an accuracy drop of as low as 27.6%. Nevertheless, adversarial training raises it up to 90.0%.

Exploring Adversarial Watermarking in Transformer-Based Models: Transferability and Robustness Against Defense Mechanism for Medical Images

TL;DR

Abstract

Exploring Adversarial Watermarking in Transformer-Based Models: Transferability and Robustness Against Defense Mechanism for Medical Images

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)