Bridging Optimal Transport and Jacobian Regularization by Optimal Trajectory for Enhanced Adversarial Defense
Binh M. Le, Shahroz Tariq, Simon S. Woo
TL;DR
This work tackles adversarial vulnerability in vision models by comparing Adversarial Training (AT) and Jacobian Regularization (JR) and proposing a unified defense, OTJR, that uses the Sliced Wasserstein distance to compute optimal latent trajectories. By replacing random projections with informative trajectories in Jacobian regularization and jointly minimizing the transport distance between clean and adversarial representations, OTJR achieves strong robustness on CIFAR-10/100 with competitive clean accuracy and favorable convergence behavior. The approach is validated across white-box and black-box attacks, online adversarial scenarios, and large-scale datasets, and is shown to be compatible with existing defense frameworks. The findings demonstrate that integrating optimal transport insights into adversarial defenses yields practical improvements and real-world resilience, highlighting the method's significance for secure deployment of deep learning systems.
Abstract
Deep neural networks, particularly in vision tasks, are notably susceptible to adversarial perturbations. To overcome this challenge, developing a robust classifier is crucial. In light of the recent advancements in the robustness of classifiers, we delve deep into the intricacies of adversarial training and Jacobian regularization, two pivotal defenses. Our work is the first carefully analyzes and characterizes these two schools of approaches, both theoretically and empirically, to demonstrate how each approach impacts the robust learning of a classifier. Next, we propose our novel Optimal Transport with Jacobian regularization method, dubbed OTJR, bridging the input Jacobian regularization with the a output representation alignment by leveraging the optimal transport theory. In particular, we employ the Sliced Wasserstein distance that can efficiently push the adversarial samples' representations closer to those of clean samples, regardless of the number of classes within the dataset. The SW distance provides the adversarial samples' movement directions, which are much more informative and powerful for the Jacobian regularization. Our empirical evaluations set a new standard in the domain, with our method achieving commendable accuracies of 52.57% on CIFAR-10 and 28.3% on CIFAR-100 datasets under the AutoAttack. Further validating our model's practicality, we conducted real-world tests by subjecting internet-sourced images to online adversarial attacks. These demonstrations highlight our model's capability to counteract sophisticated adversarial perturbations, affirming its significance and applicability in real-world scenarios.
