Improving Transferable Targeted Adversarial Attack via Normalized Logit Calibration and Truncated Feature Mixing

Juanjuan Weng; Zhiming Luo; Shaozi Li

Improving Transferable Targeted Adversarial Attack via Normalized Logit Calibration and Truncated Feature Mixing

Juanjuan Weng, Zhiming Luo, Shaozi Li

TL;DR

The paper tackles the limited transferability of targeted black-box adversarial attacks and introduces two complementary techniques: normalized logit calibration (NLC) and truncated feature mixing (TFM). NLC calibrates logits by jointly leveraging the logit margin and distribution, resulting in a normalized cross-entropy loss $L_{NCE}$ that boosts target success across models. TFM removes the Rank-1 feature via SVD and mixes truncated clean features with adversarial ones, reducing reliance on the source model and further improving transferability. Across ImageNet-Compatible and CIFAR-10, the proposed methods yield substantial gains over state-of-the-art baselines, including robustness against adversarially trained and transformer-based targets, and can be integrated with existing attack techniques for broader applicability.

Abstract

This paper aims to enhance the transferability of adversarial samples in targeted attacks, where attack success rates remain comparatively low. To achieve this objective, we propose two distinct techniques for improving the targeted transferability from the loss and feature aspects. First, in previous approaches, logit calibrations used in targeted attacks primarily focus on the logit margin between the targeted class and the untargeted classes among samples, neglecting the standard deviation of the logit. In contrast, we introduce a new normalized logit calibration method that jointly considers the logit margin and the standard deviation of logits. This approach effectively calibrates the logits, enhancing the targeted transferability. Second, previous studies have demonstrated that mixing the features of clean samples during optimization can significantly increase transferability. Building upon this, we further investigate a truncated feature mixing method to reduce the impact of the source training model, resulting in additional improvements. The truncated feature is determined by removing the Rank-1 feature associated with the largest singular value decomposed from the high-level convolutional layers of the clean sample. Extensive experiments conducted on the ImageNet-Compatible and CIFAR-10 datasets demonstrate the individual and mutual benefits of our proposed two components, which outperform the state-of-the-art methods by a large margin in black-box targeted attacks.

Improving Transferable Targeted Adversarial Attack via Normalized Logit Calibration and Truncated Feature Mixing

TL;DR

that boosts target success across models. TFM removes the Rank-1 feature via SVD and mixes truncated clean features with adversarial ones, reducing reliance on the source model and further improving transferability. Across ImageNet-Compatible and CIFAR-10, the proposed methods yield substantial gains over state-of-the-art baselines, including robustness against adversarially trained and transformer-based targets, and can be integrated with existing attack techniques for broader applicability.

Abstract

Paper Structure (30 sections, 11 equations, 3 figures, 9 tables)

This paper contains 30 sections, 11 equations, 3 figures, 9 tables.

Introduction
Related Works
Untargeted Black-Box Attacks
Targeted Black-Box Attacks
Method
Problem Definition
Normalized Logit Calibration & Loss
Revisiting Logit Loss and Logit Calibration
Normalized Logit Calibration
Loss Function
Truncated Feature Mixing
Rank-1 Feature Removing
Feature Mixing
Experiments
Experimental Settings
...and 15 more sections

Figures (3)

Figure 1: The density distributions of the standard deviation of logits and the logit margins obtained from the 1000 images across the ImageNet-Compatible dataset.
Figure 2: Overview of the Truncated Feature Mixing procedure.
Figure 3: Targeted attack success rates (%) based on the number of iterations. Best viewed in color.

Improving Transferable Targeted Adversarial Attack via Normalized Logit Calibration and Truncated Feature Mixing

TL;DR

Abstract

Improving Transferable Targeted Adversarial Attack via Normalized Logit Calibration and Truncated Feature Mixing

Authors

TL;DR

Abstract

Table of Contents

Figures (3)