S-E Pipeline: A Vision Transformer (ViT) based Resilient Classification Pipeline for Medical Imaging Against Adversarial Attacks

Neha A S; Vivek Chaturvedi; Muhammad Shafique

S-E Pipeline: A Vision Transformer (ViT) based Resilient Classification Pipeline for Medical Imaging Against Adversarial Attacks

Neha A S, Vivek Chaturvedi, Muhammad Shafique

TL;DR

This paper addresses adversarial vulnerability in Vision Transformer's medical imaging classification by introducing the Segmentation-Enhancement (S-E) Pipeline. The approach combines ROI segmentation via a U-Net with image enhancement techniques (CLAHE, Unsharp Masking, and High-Frequency Emphasis) as a preprocessing layer before ViT classification, and evaluates robustness using FGSM and PGD attacks. Empirical results show substantial reductions in attack impact, notably up to 72.22%/86.58% for FGSM on ViT-b32 and ViT-l32 respectively, and up to 36.25%/80.26% for PGD, with additional validation on CNNs and hardware deployment on the NVIDIA Jetson Orin Nano. The work demonstrates practical, edge-device-friendly defenses for medical imaging, enabling more reliable automated diagnoses in resource-constrained environments.

Abstract

Vision Transformer (ViT) is becoming widely popular in automating accurate disease diagnosis in medical imaging owing to its robust self-attention mechanism. However, ViTs remain vulnerable to adversarial attacks that may thwart the diagnosis process by leading it to intentional misclassification of critical disease. In this paper, we propose a novel image classification pipeline, namely, S-E Pipeline, that performs multiple pre-processing steps that allow ViT to be trained on critical features so as to reduce the impact of input perturbations by adversaries. Our method uses a combination of segmentation and image enhancement techniques such as Contrast Limited Adaptive Histogram Equalization (CLAHE), Unsharp Masking (UM), and High-Frequency Emphasis filtering (HFE) as preprocessing steps to identify critical features that remain intact even after adversarial perturbations. The experimental study demonstrates that our novel pipeline helps in reducing the effect of adversarial attacks by 72.22% for the ViT-b32 model and 86.58% for the ViT-l32 model. Furthermore, we have shown an end-to-end deployment of our proposed method on the NVIDIA Jetson Orin Nano board to demonstrate its practical use case in modern hand-held devices that are usually resource-constrained.

S-E Pipeline: A Vision Transformer (ViT) based Resilient Classification Pipeline for Medical Imaging Against Adversarial Attacks

TL;DR

Abstract

Paper Structure (20 sections, 2 equations, 12 figures, 8 tables, 1 algorithm)

This paper contains 20 sections, 2 equations, 12 figures, 8 tables, 1 algorithm.

Introduction
Related Work
Proposed Method: S-E Pipeline
Segmentation
Training using ViT
Calculating Rate of Reduction
Experimental setup
Model Specifications and Architectures
Adversarial Attacks
Image Enhancement Techniques
CLAHE
Unsharp Masking
High-frequency Emphasis filtering
Experiments and Results
Performance of defense mechanism in CNN
...and 5 more sections

Figures (12)

Figure 1: Overall workflow of the proposed system.
Figure 2: Segmentation using custom U-Net.
Figure 3: Workflow of the training process of the proposed defense mechanism in ViT.
Figure 4: Comparsion of output predictions from normal and corresponding adversarial image.
Figure 12: Rate of reduction for ViT using CLAHE after performing FGSM attack for epsilon values from 0.001 to 0.003.
...and 7 more figures

S-E Pipeline: A Vision Transformer (ViT) based Resilient Classification Pipeline for Medical Imaging Against Adversarial Attacks

TL;DR

Abstract

S-E Pipeline: A Vision Transformer (ViT) based Resilient Classification Pipeline for Medical Imaging Against Adversarial Attacks

Authors

TL;DR

Abstract

Table of Contents

Figures (12)