Table of Contents
Fetching ...

MALT Powers Up Adversarial Attacks

Odelia Melamed, Gilad Yehudai, Adi Shamir

TL;DR

MALT introduces Mesoscopic Almost Linearity Targeting to improve adversarial attacks by reordering target classes using a mesoscopic linearity-inspired score, enabling targeted APGD attacks to reach more samples faster. The authors provide theoretical and empirical support for mesoscopic almost linearity in neural networks and demonstrate that MALT achieves up to ~5x faster attack times while matching or exceeding AutoAttack’s success on CIFAR-100 and ImageNet across RobustBench models. The approach relies on normalizing class confidence by the gradient difference between class logits, preserving effective targeting even along adversarial trajectories. Practically, MALT offers a scalable, hardware-friendly alternative for evaluating robustness and highlights the persistence of almost-linear behavior in neural networks at mesoscopic scales.

Abstract

Current adversarial attacks for multi-class classifiers choose the target class for a given input naively, based on the classifier's confidence levels for various target classes. We present a novel adversarial targeting method, \textit{MALT - Mesoscopic Almost Linearity Targeting}, based on medium-scale almost linearity assumptions. Our attack wins over the current state of the art AutoAttack on the standard benchmark datasets CIFAR-100 and ImageNet and for a variety of robust models. In particular, our attack is \emph{five times faster} than AutoAttack, while successfully matching all of AutoAttack's successes and attacking additional samples that were previously out of reach. We then prove formally and demonstrate empirically that our targeting method, although inspired by linear predictors, also applies to standard non-linear models.

MALT Powers Up Adversarial Attacks

TL;DR

MALT introduces Mesoscopic Almost Linearity Targeting to improve adversarial attacks by reordering target classes using a mesoscopic linearity-inspired score, enabling targeted APGD attacks to reach more samples faster. The authors provide theoretical and empirical support for mesoscopic almost linearity in neural networks and demonstrate that MALT achieves up to ~5x faster attack times while matching or exceeding AutoAttack’s success on CIFAR-100 and ImageNet across RobustBench models. The approach relies on normalizing class confidence by the gradient difference between class logits, preserving effective targeting even along adversarial trajectories. Practically, MALT offers a scalable, hardware-friendly alternative for evaluating robustness and highlights the persistence of almost-linear behavior in neural networks at mesoscopic scales.

Abstract

Current adversarial attacks for multi-class classifiers choose the target class for a given input naively, based on the classifier's confidence levels for various target classes. We present a novel adversarial targeting method, \textit{MALT - Mesoscopic Almost Linearity Targeting}, based on medium-scale almost linearity assumptions. Our attack wins over the current state of the art AutoAttack on the standard benchmark datasets CIFAR-100 and ImageNet and for a variety of robust models. In particular, our attack is \emph{five times faster} than AutoAttack, while successfully matching all of AutoAttack's successes and attacking additional samples that were previously out of reach. We then prove formally and demonstrate empirically that our targeting method, although inspired by linear predictors, also applies to standard non-linear models.
Paper Structure (34 sections, 7 theorems, 23 equations, 8 figures, 2 tables, 1 algorithm)

This paper contains 34 sections, 7 theorems, 23 equations, 8 figures, 2 tables, 1 algorithm.

Key Result

Lemma 3.1

Consider a linear predictor over $k$ classes of the form $F(\mathbf{x}) = W\mathbf{x} + \mathbf{b}$ where $\mathbf{x}\in {\mathbb R}^d, W\in{\mathbb R}^{k\times d}$ and $\mathbf{b}\in {\mathbb R}^k$. Denote the $i$-th row of $W$ by $\mathbf{w}_i$ and by $F_i(\mathbf{x}) = \langle \mathbf{w}_i,\mathb

Figures (8)

  • Figure 1: Examples of images from the ImageNet dataset that AutoAttack fails to attack while MALT succeeds. The top row shows an APGD attack on the target class with the highest logit, and the bottom row shows an APGD attack on the class which MALT finds and succeeds, corresponding to the (a) $18$th and (b) $52$nd classes with the highest logits. The images are shown before and after the attack, and the change in logits is presented in the middle column.
  • Figure 2: Measurement of mesoscopic almost linearity experimentally when taking a step $\mathbf{v}$ away from test image $x_0$ for CIFAR100 and ImageNet. The results are averaged over all the images in the test set, where (a) random step; and (b) Direction of the gradient (adversarial step).
  • Figure 3: Empirical mesoscopic almost linearity: demonstrating the logits changes from an image $\mathbf{x}_0$ to its adversarial example. In the third row, we plot the model's output logits changes, and in the bottom row are the results of the linear approximation of the model at $\mathbf{x}_0$.
  • Figure 4: Comparing targeting methods for liu2023comprehensive SOTA model: The number of successful attacks for each target order by two targeting methods: In blue, we use MALT targeting and APGD, and in orange, we compare to APGD with top logits targeting performed in AutoAttack.
  • Figure 5: Additional examples of images from the ImageNet dataset that AutoAttack fails to attack while MALT succeeds. The top row shows an APGD attack on the target class with the highest logit, and the bottom row shows an APGD attack on the class that MALT finds and succeeds. (a) and (b) examples from Swin-L liu2023comprehensive network. (c) through (e) are from ConvNext-L liu2023comprehensive network. The images are shown before and after the attack, and the change in logits is presented in the middle column.
  • ...and 3 more figures

Theorems & Definitions (11)

  • Lemma 3.1
  • Theorem 4.1
  • Theorem 4.2
  • Corollary 4.3
  • Remark 4.4: Assumption in the theoretical part
  • proof : Proof of Lemma \ref{['thm:linear predictors']}
  • proof
  • proof
  • Lemma C.1
  • Lemma C.2: Theorem A.2 from melamed2023adversarial
  • ...and 1 more