Table of Contents
Fetching ...

LOTOS: Layer-wise Orthogonalization for Training Robust Ensembles

Ali Ebrahimpour-Boroojeny, Hari Sundaram, Varun Chandrasekaran

TL;DR

The paper investigates adversarial transferability in model ensembles and identifies a trade-off: reducing the Lipschitz constant $L$ boosts per-model robustness but can increase the transferability rate $T_{rate}$ between ensemble members. To counteract this, LOTOS (Layer-wise Orthogonalization for Training Robust Ensembles) promotes orthogonality among the top-$k$ sub-spaces of corresponding affine layers across models, implemented as an additional loss term with weight $ rac{ ext{loss}_{CE}}{M N (N-1)}$ and a parameter $ ext{mal}$, with strong efficiency for convolutional layers (theoretical bound showing $k=1$ can be effective). Empirically, LOTOS lowers $T_{rate}$ and improves robust ensemble accuracy—e.g., approximately a 6 percentage-point gain on CIFAR-10 with ResNet-18 against black-box attacks, and up to an additional 10.7 percentage points when combined with prior robust-ensemble methods or adversarial training. LOTOS also works with heterogeneous architectures and can be integrated with adversarial training to further boost robustness, while maintaining modest computational overhead; its limitations include reduced accuracy of clipping in networks with batch-norm layers, which can temper gains in some settings.

Abstract

Transferability of adversarial examples is a well-known property that endangers all classification models, even those that are only accessible through black-box queries. Prior work has shown that an ensemble of models is more resilient to transferability: the probability that an adversarial example is effective against most models of the ensemble is low. Thus, most ongoing research focuses on improving ensemble diversity. Another line of prior work has shown that Lipschitz continuity of the models can make models more robust since it limits how a model's output changes with small input perturbations. In this paper, we study the effect of Lipschitz continuity on transferability rates. We show that although a lower Lipschitz constant increases the robustness of a single model, it is not as beneficial in training robust ensembles as it increases the transferability rate of adversarial examples across models in the ensemble. Therefore, we introduce LOTOS, a new training paradigm for ensembles, which counteracts this adverse effect. It does so by promoting orthogonality among the top-$k$ sub-spaces of the transformations of the corresponding affine layers of any pair of models in the ensemble. We theoretically show that $k$ does not need to be large for convolutional layers, which makes the computational overhead negligible. Through various experiments, we show LOTOS increases the robust accuracy of ensembles of ResNet-18 models by $6$ percentage points (p.p) against black-box attacks on CIFAR-10. It is also capable of combining with the robustness of prior state-of-the-art methods for training robust ensembles to enhance their robust accuracy by $10.7$ p.p.

LOTOS: Layer-wise Orthogonalization for Training Robust Ensembles

TL;DR

The paper investigates adversarial transferability in model ensembles and identifies a trade-off: reducing the Lipschitz constant boosts per-model robustness but can increase the transferability rate between ensemble members. To counteract this, LOTOS (Layer-wise Orthogonalization for Training Robust Ensembles) promotes orthogonality among the top- sub-spaces of corresponding affine layers across models, implemented as an additional loss term with weight and a parameter , with strong efficiency for convolutional layers (theoretical bound showing can be effective). Empirically, LOTOS lowers and improves robust ensemble accuracy—e.g., approximately a 6 percentage-point gain on CIFAR-10 with ResNet-18 against black-box attacks, and up to an additional 10.7 percentage points when combined with prior robust-ensemble methods or adversarial training. LOTOS also works with heterogeneous architectures and can be integrated with adversarial training to further boost robustness, while maintaining modest computational overhead; its limitations include reduced accuracy of clipping in networks with batch-norm layers, which can temper gains in some settings.

Abstract

Transferability of adversarial examples is a well-known property that endangers all classification models, even those that are only accessible through black-box queries. Prior work has shown that an ensemble of models is more resilient to transferability: the probability that an adversarial example is effective against most models of the ensemble is low. Thus, most ongoing research focuses on improving ensemble diversity. Another line of prior work has shown that Lipschitz continuity of the models can make models more robust since it limits how a model's output changes with small input perturbations. In this paper, we study the effect of Lipschitz continuity on transferability rates. We show that although a lower Lipschitz constant increases the robustness of a single model, it is not as beneficial in training robust ensembles as it increases the transferability rate of adversarial examples across models in the ensemble. Therefore, we introduce LOTOS, a new training paradigm for ensembles, which counteracts this adverse effect. It does so by promoting orthogonality among the top- sub-spaces of the transformations of the corresponding affine layers of any pair of models in the ensemble. We theoretically show that does not need to be large for convolutional layers, which makes the computational overhead negligible. Through various experiments, we show LOTOS increases the robust accuracy of ensembles of ResNet-18 models by percentage points (p.p) against black-box attacks on CIFAR-10. It is also capable of combining with the robustness of prior state-of-the-art methods for training robust ensembles to enhance their robust accuracy by p.p.
Paper Structure (29 sections, 4 theorems, 15 equations, 9 figures, 6 tables)

This paper contains 29 sections, 4 theorems, 15 equations, 9 figures, 6 tables.

Key Result

Proposition 3.3

Assume $\mathcal{X} = [0,1]^d$ and $\|\delta_x\| \leq \epsilon$. For two models $\mathcal{F}$ and $\mathcal{G}$, if the loss function on both for any $y \in \mathcal{Y}$ is $L$-Lipschitz with respect to the inputs, we have the following inequality:

Figures (9)

  • Figure 1: Accuracy vs. Robust Accuracy vs. Transferability: Changes in the average accuracy and robust accuracy of individual ResNet-18 models, along with the average transferability rate between any pair of the models in each ensemble as the layer-wise clipping value (spectral norm) changes. As the plots show, although the robustness of individual models increases with decreasing the clipping value, the transferability rate among the models increases, which might forfeit the benefits of the clipping in the robustness of the whole ensemble.
  • Figure 2: Accuracy vs. Robust Accuracy vs. Transferability: Changes in the average accuracy and robust accuracy of individual ResNet-18 models (with batch norm layers), along with the average transferability rate between any pair of the models in each ensemble as the layer-wise clipping value changes. As the plots show, although the robustness of individual models increases with decreasing the clipping value, the $T_{rate}$ among the models increases, which might forfeit the benefits of the clipping in the robustness of the whole ensemble.
  • Figure 3: Reducing transferability while maintaining the benefits of Lipschitzness. Evaluation of the average test accuracy (left-most plot) and average robust accuracy (middle plot) of the individual models in an ensemble of three ResNet-18 models, along with the $T_{rate}$ of adversarial examples between the models of the ensemble using the white-box setting (see Appendix \ref{['sec:setup']}). LOTOS keeps the robust accuracy of individual models in the ensemble much higher than those of the Orig ensemble and as mal increases, it becomes more similar to the models in $C=1$. On the other hand, LOTOS leads to a much lower transferability (about $20\%$) and the difference increases as mal decreases (right-most plot). These benefits come at a slight cost to the average accuracy of the individual models (left-most plot).
  • Figure 4: (Left) Effect of $k$. As the plot shows, the transferability slightly decreases (up to $1\%$) as $k$ gets larger up to some point (here $k=15$) and then starts to increase ($k=20$). (Right) First layer might be enough! Comparing the effect of applying LOTOS to only the first layer rather than all the convolutional layers. The similarity of the results motivates the effectiveness of LOTOS for heterogeneous models where it can only be applied to the first layers.
  • Figure 5: Investigating the effect of LOTOS on the average accuracy and robust accuracy of each of the models of heterogeneous ensembles of DLA, ResNet-18, and ResNet-34 models, along with presenting the average transferability among any pair of the models in the ensemble as the training proceeds. As the plots show, LOTOS leads to a lower transferability among the models while maintaining the benefits of controlling the Lipschitz constant on the robustness of individual models.
  • ...and 4 more figures

Theorems & Definitions (9)

  • Definition 3.1: Attack Algorithm
  • Definition 3.2: Transferability Rate
  • Proposition 3.3
  • Theorem 4.1
  • proof
  • Lemma A.1
  • proof
  • Corollary A.2
  • proof