Understanding Model Ensemble in Transferable Adversarial Attack
Wei Yao, Zeliang Zhang, Huayi Tang, Yong Liu
TL;DR
This work addresses the theoretical understanding of transferable adversarial attacks generated via model ensembles. It formalizes transferability error $TE(z,\epsilon)$, diversity as prediction variance, and an empirical ensemble Rademacher complexity, then derives a vulnerability-diversity decomposition $TE(z,\epsilon)=L_P(z^*)-\underbrace{l(\tilde{f}(\theta;x),y)}_{\text{Vulnerability}}-\underbrace{\mathrm{Var}_{\theta} f(\theta;x)}_{\text{Diversity}}$, paired with an information-theoretic upper bound $TE(z,\epsilon) \le 4\mathcal{R}_{N}(\mathcal{Z}) + \sqrt{\frac{18 \gamma \beta^2}{N} \ln( \cdots )}$ that ties transferability to ensemble size, diversity, and complexity. The authors show that increasing the number of surrogate models, boosting their diversity, and reducing their complexity (to mitigate overfitting) tightens the bound and enhances transferability, a conclusion supported by extensive experiments on 54 surrogate models across multiple datasets. The results provide practical guidance for constructing more transferable ensemble attacks and offer insights relevant to defense by highlighting the trade-off between vulnerability and diversity. Overall, the paper advances the theoretical foundation for transferable model ensemble attacks and demonstrates its empirical validity.
Abstract
Model ensemble adversarial attack has become a powerful method for generating transferable adversarial examples that can target even unknown models, but its theoretical foundation remains underexplored. To address this gap, we provide early theoretical insights that serve as a roadmap for advancing model ensemble adversarial attack. We first define transferability error to measure the error in adversarial transferability, alongside concepts of diversity and empirical model ensemble Rademacher complexity. We then decompose the transferability error into vulnerability, diversity, and a constant, which rigidly explains the origin of transferability error in model ensemble attack: the vulnerability of an adversarial example to ensemble components, and the diversity of ensemble components. Furthermore, we apply the latest mathematical tools in information theory to bound the transferability error using complexity and generalization terms, contributing to three practical guidelines for reducing transferability error: (1) incorporating more surrogate models, (2) increasing their diversity, and (3) reducing their complexity in cases of overfitting. Finally, extensive experiments with 54 models validate our theoretical framework, representing a significant step forward in understanding transferable model ensemble adversarial attacks.
