A Statistical Method for Attack-Agnostic Adversarial Attack Detection with Compressive Sensing Comparison
Chinthana Wimalasuriya, Spyros Tragoudas
TL;DR
The paper tackles the vulnerability of neural networks to adversarial inputs by introducing a pre-deployment, attack-agnostic detector that relies on a compressed/uncompressed network pair and a distribution-identity metric. Online, it computes a run-time score $P_A = \|V_R - V_C\|_2$ based on $KL$-divergence-driven distances and a Mann-Whitney $p$-value-derived identity, using a universal threshold $T$ to separate clean from adversarial samples. The method achieves near-perfect detection across multiple attack types (e.g., FGSM, PGD, Square, DeepFool, CW) on CIFAR-10, CIFAR-100, and a 50-class ImageNet subset, with dataset-specific thresholds yielding low false positives. The approach leverages JPEG2000-based denoising and distribution-identity matching to provide a robust, attack-agnostic defense suitable for real-world deployment, reducing dependence on attack-specific training data.
Abstract
Adversarial attacks present a significant threat to modern machine learning systems. Yet, existing detection methods often lack the ability to detect unseen attacks or detect different attack types with a high level of accuracy. In this work, we propose a statistical approach that establishes a detection baseline before a neural network's deployment, enabling effective real-time adversarial detection. We generate a metric of adversarial presence by comparing the behavior of a compressed/uncompressed neural network pair. Our method has been tested against state-of-the-art techniques, and it achieves near-perfect detection across a wide range of attack types. Moreover, it significantly reduces false positives, making it both reliable and practical for real-world applications.
