Table of Contents
Fetching ...

Understanding Model Ensemble in Transferable Adversarial Attack

Wei Yao, Zeliang Zhang, Huayi Tang, Yong Liu

TL;DR

This work addresses the theoretical understanding of transferable adversarial attacks generated via model ensembles. It formalizes transferability error $TE(z,\epsilon)$, diversity as prediction variance, and an empirical ensemble Rademacher complexity, then derives a vulnerability-diversity decomposition $TE(z,\epsilon)=L_P(z^*)-\underbrace{l(\tilde{f}(\theta;x),y)}_{\text{Vulnerability}}-\underbrace{\mathrm{Var}_{\theta} f(\theta;x)}_{\text{Diversity}}$, paired with an information-theoretic upper bound $TE(z,\epsilon) \le 4\mathcal{R}_{N}(\mathcal{Z}) + \sqrt{\frac{18 \gamma \beta^2}{N} \ln( \cdots )}$ that ties transferability to ensemble size, diversity, and complexity. The authors show that increasing the number of surrogate models, boosting their diversity, and reducing their complexity (to mitigate overfitting) tightens the bound and enhances transferability, a conclusion supported by extensive experiments on 54 surrogate models across multiple datasets. The results provide practical guidance for constructing more transferable ensemble attacks and offer insights relevant to defense by highlighting the trade-off between vulnerability and diversity. Overall, the paper advances the theoretical foundation for transferable model ensemble attacks and demonstrates its empirical validity.

Abstract

Model ensemble adversarial attack has become a powerful method for generating transferable adversarial examples that can target even unknown models, but its theoretical foundation remains underexplored. To address this gap, we provide early theoretical insights that serve as a roadmap for advancing model ensemble adversarial attack. We first define transferability error to measure the error in adversarial transferability, alongside concepts of diversity and empirical model ensemble Rademacher complexity. We then decompose the transferability error into vulnerability, diversity, and a constant, which rigidly explains the origin of transferability error in model ensemble attack: the vulnerability of an adversarial example to ensemble components, and the diversity of ensemble components. Furthermore, we apply the latest mathematical tools in information theory to bound the transferability error using complexity and generalization terms, contributing to three practical guidelines for reducing transferability error: (1) incorporating more surrogate models, (2) increasing their diversity, and (3) reducing their complexity in cases of overfitting. Finally, extensive experiments with 54 models validate our theoretical framework, representing a significant step forward in understanding transferable model ensemble adversarial attacks.

Understanding Model Ensemble in Transferable Adversarial Attack

TL;DR

This work addresses the theoretical understanding of transferable adversarial attacks generated via model ensembles. It formalizes transferability error , diversity as prediction variance, and an empirical ensemble Rademacher complexity, then derives a vulnerability-diversity decomposition , paired with an information-theoretic upper bound that ties transferability to ensemble size, diversity, and complexity. The authors show that increasing the number of surrogate models, boosting their diversity, and reducing their complexity (to mitigate overfitting) tightens the bound and enhances transferability, a conclusion supported by extensive experiments on 54 surrogate models across multiple datasets. The results provide practical guidance for constructing more transferable ensemble attacks and offer insights relevant to defense by highlighting the trade-off between vulnerability and diversity. Overall, the paper advances the theoretical foundation for transferable model ensemble attacks and demonstrates its empirical validity.

Abstract

Model ensemble adversarial attack has become a powerful method for generating transferable adversarial examples that can target even unknown models, but its theoretical foundation remains underexplored. To address this gap, we provide early theoretical insights that serve as a roadmap for advancing model ensemble adversarial attack. We first define transferability error to measure the error in adversarial transferability, alongside concepts of diversity and empirical model ensemble Rademacher complexity. We then decompose the transferability error into vulnerability, diversity, and a constant, which rigidly explains the origin of transferability error in model ensemble attack: the vulnerability of an adversarial example to ensemble components, and the diversity of ensemble components. Furthermore, we apply the latest mathematical tools in information theory to bound the transferability error using complexity and generalization terms, contributing to three practical guidelines for reducing transferability error: (1) incorporating more surrogate models, (2) increasing their diversity, and (3) reducing their complexity in cases of overfitting. Finally, extensive experiments with 54 models validate our theoretical framework, representing a significant step forward in understanding transferable model ensemble adversarial attacks.

Paper Structure

This paper contains 58 sections, 12 theorems, 91 equations, 6 figures, 3 tables.

Key Result

Lemma 3.2

The transferability error defined by Eq. (define:eq:te) is bounded by the largest absolute difference between $L_P(z)$ and $L_E(z)$, i.e.,

Figures (6)

  • Figure 1: Vulnerability-diversity decomposition of transferability error. (a) The transferability error is defined as the difference in expected loss value between a given adversarial example and the most transferable one. (b) Vulnerability is the loss value of the expected ensemble classifier on the adversarial example. (c) Diversity is the variance in model ensemble predictions that correspond to the correct class.
  • Figure 2: Evaluation of ensemble attacks with increasing the number of steps using MLPs and CNNs on the MNIST dataset.
  • Figure 3: Evaluation of ensemble attacks with increasing the number of steps using MLPs and CNNs on the Fashion-MNIST dataset.
  • Figure 4: Evaluation of ensemble attacks with increasing the number of steps using MLPs and CNNs on the CIFAR-10 dataset.
  • Figure 5: Evaluation of ensemble attacks with increasing the number of models using MLPs and CNNs on the three datasets.
  • ...and 1 more figures

Theorems & Definitions (34)

  • Definition 3.1: Transferability Error
  • Lemma 3.2
  • Definition 3.3: Diversity of Model Ensemble Attack
  • Remark
  • Definition 3.4: Empirical Model Ensemble Rademacher Complexity
  • Theorem 4.1: Vulnerability-diversity Decomposition
  • Remark
  • Lemma 4.2: Ensemble Complexity of MLP
  • Remark
  • Theorem 4.3: Upper bound of Transferability Error
  • ...and 24 more