DifAttack++: Query-Efficient Black-Box Adversarial Attack via Hierarchical Disentangled Feature Space in Cross-Domain

Jun Liu; Jiantao Zhou; Jiandian Zeng; Jinyu Tian; Zheng Li

DifAttack++: Query-Efficient Black-Box Adversarial Attack via Hierarchical Disentangled Feature Space in Cross-Domain

Jun Liu, Jiantao Zhou, Jiandian Zeng, Jinyu Tian, Zheng Li

TL;DR

DifAttack++ addresses the challenge of efficient score-based black-box adversarial attacks by introducing a hierarchical disentangled feature space that separately encodes an adversarial feature (AF) and a visual feature (VF). The method trains two cross-domain autoencoders via a Hierarchical Decouple-Fusion module to disentangle AFs and VFs and facilitate image reconstruction, then optimizes AFs under query feedback while keeping VFs fixed to craft adversarial examples. Empirical results across close-set/open-set tasks—including ImageNet, Food101, ObjectNet, CLIP models, and a real-world Imagga API—demonstrate superior Attack Success Rate (ASR) and reduced query counts compared to state-of-the-art baselines, while preserving high visual quality. The work also provides extensive ablations and analyses showing the disentanglement decouples adversarial capability from appearance, enabling robust performance against defenses and in cross-domain scenarios with unknown training data.

Abstract

This work investigates efficient score-based black-box adversarial attacks with a high Attack Success Rate (\textbf{ASR}) and good generalizability. We design a novel attack method based on a hierarchical DIsentangled Feature space, called \textbf{DifAttack++}, which differs significantly from the existing ones operating over the entire feature space. Specifically, DifAttack++ firstly disentangles an image's latent feature into an Adversarial Feature (\textbf{AF}) and a Visual Feature (\textbf{VF}) via an autoencoder equipped with our specially designed Hierarchical Decouple-Fusion (\textbf{HDF}) module, where the AF dominates the adversarial capability of an image, while the VF largely determines its visual appearance. We train such two autoencoders for the clean and adversarial image domains (i.e., cross-domain) respectively to achieve image reconstructions and feature disentanglement, by using pairs of clean images and their Adversarial Examples (\textbf{AE}s) generated from available surrogate models via white-box attack methods. Eventually, in the black-box attack stage, DifAttack++ iteratively optimizes the AF according to the query feedback from the victim model until a successful AE is generated, while keeping the VF unaltered. Extensive experimental results demonstrate that our DifAttack++ leads to superior ASR and query efficiency than state-of-the-art methods, meanwhile exhibiting much better visual quality of AEs. The code is available at https://github.com/csjunjun/DifAttack.git.

DifAttack++: Query-Efficient Black-Box Adversarial Attack via Hierarchical Disentangled Feature Space in Cross-Domain

TL;DR

Abstract

Paper Structure (25 sections, 22 equations, 4 figures, 8 tables, 1 algorithm)

This paper contains 25 sections, 22 equations, 4 figures, 8 tables, 1 algorithm.

Introduction
Related Works
Score-based Black-box Attack Methods
Disentangled Representation
Proposed DifAttack++
Training Stage: Train Autoencoders $\mathcal{G}$ and $\mathcal{G}^*$
Disentanglement.
Reconstruction.
Attack Stage: Generate the AE $\mathbf{x}^\prime$
Close-set scenarios
Open-set scenarios
Experiments
Experiment Setup
Datasets.
Classifiers.
...and 10 more sections

Figures (4)

Figure 1: (a) The training procedure of autoencoders $\mathcal{G}$ and $\mathcal{G}^*$ equipped with our proposed HDF module for disentangling adversarial and visual features. (b) The proposed black-box adversarial attack method, i.e. DifAttack++, incorporated with the pre-trained $\mathcal{G}^*$.
Figure 2: The visualization of AEs produced by MCGSquare andriushchenko2020square and DifAttack++.
Figure 3: The test Avg.Q with respect to the training time, when using DifAttack or DifAttack++ to perform untargeted attacks on ImageNet.
Figure 4: The visualization of disentangled representation in DifAttack++. Images $\tilde{\mathbf{x}}_1$ and $\tilde{\mathbf{x}}_2$ are the reconstructed versions of $\mathbf{x}_1$ and $\mathbf{x}_2$ respectively by our pretrained authoencoder $\mathcal{G}$. The image $\tilde{\mathbf{x}}_{v1a2}$ is reconstructed by $\mathcal{G}$ using the VFs of $\mathbf{x}_1$ and the AFs of $\mathbf{x}_2$, while $\tilde{\mathbf{x}}_{v2a1}$ is reconstructed by $\mathcal{G}$ using the VFs of $\mathbf{x}_2$ and the AFs of $\mathbf{x}_1$.

DifAttack++: Query-Efficient Black-Box Adversarial Attack via Hierarchical Disentangled Feature Space in Cross-Domain

TL;DR

Abstract

DifAttack++: Query-Efficient Black-Box Adversarial Attack via Hierarchical Disentangled Feature Space in Cross-Domain

Authors

TL;DR

Abstract

Table of Contents

Figures (4)