Data-Free Universal Attack by Exploiting the Intrinsic Vulnerability of Deep Models
YangTian Yan, Jinyu Tian
TL;DR
This work tackles the problem of generating Universal Adversarial Perturbations without access to data by exploiting the intrinsic vulnerability of L1LOS models, where nonlinear components have Lipschitz constant $1$ and linear parts dominate perturbation amplification. The authors introduce IntriUAP, which aligns input perturbations with the top right singular vectors of each linear layer to maximize perturbation propagation, and they optimize this through a data-free objective using only a few layers and no ground-truth labels. Theoretical justification shows that the model's Lipschitz bound is governed by linear operators and that the maximal perturbation aligns with the largest singular value direction, supporting a principled data-free attack. Empirically, IntriUAP achieves state-of-the-art performance among data-free methods on ImageNet, transfers well in black-box settings, remains robust under common defenses, and remains effective even when the attacker has access to only a subset of the victim model’s layers, highlighting significant practical security implications.
Abstract
Deep neural networks (DNNs) are susceptible to Universal Adversarial Perturbations (UAPs), which are instance agnostic perturbations that can deceive a target model across a wide range of samples. Unlike instance-specific adversarial examples, UAPs present a greater challenge as they must generalize across different samples and models. Generating UAPs typically requires access to numerous examples, which is a strong assumption in real-world tasks. In this paper, we propose a novel data-free method called Intrinsic UAP (IntriUAP), by exploiting the intrinsic vulnerabilities of deep models. We analyze a series of popular deep models composed of linear and nonlinear layers with a Lipschitz constant of 1, revealing that the vulnerability of these models is predominantly influenced by their linear components. Based on this observation, we leverage the ill-conditioned nature of the linear components by aligning the UAP with the right singular vectors corresponding to the maximum singular value of each linear layer. Remarkably, our method achieves highly competitive performance in attacking popular image classification deep models without using any image samples. We also evaluate the black-box attack performance of our method, showing that it matches the state-of-the-art baseline for data-free methods on models that conform to our theoretical framework. Beyond the data-free assumption, IntriUAP also operates under a weaker assumption, where the adversary only can access a few of the victim model's layers. Experiments demonstrate that the attack success rate decreases by only 4% when the adversary has access to just 50% of the linear layers in the victim model.
