Boosting Adversarial Transferability across Model Genus by Deformation-Constrained Warping
Qinliang Lin, Cheng Luo, Zenghao Niu, Xilin He, Weicheng Xie, Yuanbo Hou, Linlin Shen, Siyang Song
TL;DR
This work tackles the challenge of adversarial transferability across model genera (e.g., CNNs to Vision Transformers). It introduces Deformation-Constrained Warping Attack (DeCoWA), which embeds a deformation-based input transformation (DeCoW) with adaptive constraints into a gradient-based attack to diversify local geometry while preserving global semantics. The method yields substantial transfer gains across image, video, and audio tasks, outperforming state-of-the-art input-augmentation baselines and is supported by Grad-CAM analyses showing CNNs adopting more global attention under DeCoW. These results establish a strong, modality-spanning baseline for cross-genus adversarial transferability and highlight avenues for defense and further cross-domain research.
Abstract
Adversarial examples generated by a surrogate model typically exhibit limited transferability to unknown target systems. To address this problem, many transferability enhancement approaches (e.g., input transformation and model augmentation) have been proposed. However, they show poor performances in attacking systems having different model genera from the surrogate model. In this paper, we propose a novel and generic attacking strategy, called Deformation-Constrained Warping Attack (DeCoWA), that can be effectively applied to cross model genus attack. Specifically, DeCoWA firstly augments input examples via an elastic deformation, namely Deformation-Constrained Warping (DeCoW), to obtain rich local details of the augmented input. To avoid severe distortion of global semantics led by random deformation, DeCoW further constrains the strength and direction of the warping transformation by a novel adaptive control strategy. Extensive experiments demonstrate that the transferable examples crafted by our DeCoWA on CNN surrogates can significantly hinder the performance of Transformers (and vice versa) on various tasks, including image classification, video action recognition, and audio recognition. Code is made available at https://github.com/LinQinLiang/DeCoWA.
