Table of Contents
Fetching ...

Rethinking Target Label Conditioning in Adversarial Attacks: A 2D Tensor-Guided Generative Approach

Hangyu Liu, Bo Peng, Pengxiang Ding, Donglin Wang

TL;DR

Rethinking target label conditioning in adversarial attacks, this work reveals that encoding target information as 1D vectors hampers the fidelity and transferability of multi-target attacks. It introduces TGAF, which encodes targets as 2D semantic tensors via diffusion models and fuses them with image features through convolutional and transformer-based modules, aided by a random masking strategy during training. TGAF demonstrates superior targeted transferability across normally trained and robust models and remains resilient under multiple defense mechanisms, while maintaining competitive perceptual quality of adversarial examples. The approach offers a practical framework for evaluating model robustness under black-box attacks and highlights the potential of diffusion-guided 2D representations in adversarial generation.

Abstract

Compared to single-target adversarial attacks, multi-target attacks have garnered significant attention due to their ability to generate adversarial images for multiple target classes simultaneously. However, existing generative approaches for multi-target attacks primarily encode target labels into one-dimensional tensors, leading to a loss of fine-grained visual information and overfitting to model-specific features during noise generation. To address this gap, we first identify and validate that the semantic feature quality and quantity are critical factors affecting the transferability of targeted attacks: 1) Feature quality refers to the structural and detailed completeness of the implanted target features, as deficiencies may result in the loss of key discriminative information; 2) Feature quantity refers to the spatial sufficiency of the implanted target features, as inadequacy limits the victim model's attention to this feature. Based on these findings, we propose the 2D Tensor-Guided Adversarial Fusion (TGAF) framework, which leverages the powerful generative capabilities of diffusion models to encode target labels into two-dimensional semantic tensors for guiding adversarial noise generation. Additionally, we design a novel masking strategy tailored for the training process, ensuring that parts of the generated noise retain complete semantic information about the target class. Extensive experiments demonstrate that TGAF consistently surpasses state-of-the-art methods across various settings.

Rethinking Target Label Conditioning in Adversarial Attacks: A 2D Tensor-Guided Generative Approach

TL;DR

Rethinking target label conditioning in adversarial attacks, this work reveals that encoding target information as 1D vectors hampers the fidelity and transferability of multi-target attacks. It introduces TGAF, which encodes targets as 2D semantic tensors via diffusion models and fuses them with image features through convolutional and transformer-based modules, aided by a random masking strategy during training. TGAF demonstrates superior targeted transferability across normally trained and robust models and remains resilient under multiple defense mechanisms, while maintaining competitive perceptual quality of adversarial examples. The approach offers a practical framework for evaluating model robustness under black-box attacks and highlights the potential of diffusion-guided 2D representations in adversarial generation.

Abstract

Compared to single-target adversarial attacks, multi-target attacks have garnered significant attention due to their ability to generate adversarial images for multiple target classes simultaneously. However, existing generative approaches for multi-target attacks primarily encode target labels into one-dimensional tensors, leading to a loss of fine-grained visual information and overfitting to model-specific features during noise generation. To address this gap, we first identify and validate that the semantic feature quality and quantity are critical factors affecting the transferability of targeted attacks: 1) Feature quality refers to the structural and detailed completeness of the implanted target features, as deficiencies may result in the loss of key discriminative information; 2) Feature quantity refers to the spatial sufficiency of the implanted target features, as inadequacy limits the victim model's attention to this feature. Based on these findings, we propose the 2D Tensor-Guided Adversarial Fusion (TGAF) framework, which leverages the powerful generative capabilities of diffusion models to encode target labels into two-dimensional semantic tensors for guiding adversarial noise generation. Additionally, we design a novel masking strategy tailored for the training process, ensuring that parts of the generated noise retain complete semantic information about the target class. Extensive experiments demonstrate that TGAF consistently surpasses state-of-the-art methods across various settings.

Paper Structure

This paper contains 27 sections, 12 equations, 7 figures, 17 tables.

Figures (7)

  • Figure 1: Comparison of multi-target approaches. Previous methods (top row) use 1D label encoding to guide noise generation for adversarial examples (AE). However, this often loses fine-grained details because images are 2D, potentially leading to overfitting. Our method (bottom row) generates 2D latent representations from target labels, better preserving structural information.
  • Figure 2: (a) Visualization comparison of C-GSP, CGNC, and TGAF. Each row displays an adversarial example and its corresponding perturbation map for a distinct target label: "barometer" (first row) and "fig" (second row). The surrogate model used is Inc-v3. TGAF demonstrably surpasses C-GSP and CGNC by more effectively capturing both the target semantic details (e.g., the barometer's pointer) and the target semantic quantity (e.g., the number of figs). (b) Quantitative analysis of feature quantity and feature quality. Feature quantity is measured by the percentage of high-attention area analyzed via Grad-CAM. Feature quality is measured by the cosine similarity between the perturbation's feature vector and the average feature vector of real target class images. Experiments are conducted on 1000 images.
  • Figure 3: Demonstrating the impact of feature implantation. The heatmaps are generated on Res-152 using Grad-CAM. Left: The original image and heatmap. Middle: The masked image and heatmap (Masked trench coat buttons). Right: The perturbation and heatmap. Results confirm that 1) insufficient feature details reduce accuracy, while 2) more features have a higher target probability.
  • Figure 4: The framework of 2D Target-Guided Adversarial Fusion (TGAF). (a) Overall architecture; (b) Text-to-Image Encoder; (c) Convolution-based Fusion (CbF) module; (d) Transformer-based Fusion (TbF) module; (e) Feature Integration Module. Abbreviations: OIE (Original Image Embeddings), FE (Fusion Embeddings), TIE (Target Image Embeddings).
  • Figure 5: Prompts used for latent tensors utilized for generating the target image.
  • ...and 2 more figures