Table of Contents
Fetching ...

Latent Code Augmentation Based on Stable Diffusion for Data-free Substitute Attacks

Mingwen Shao, Lingzhuang Meng, Yuanjian Qiao, Lixu Zhang, Wangmeng Zuo

TL;DR

This work proposes latent code augmentation (LCA) to facilitate SD in generating data that aligns with the data distribution of the target model and augment the latent codes of the inferred member data with LCA and use them as guidance for SD.

Abstract

Since the training data of the target model is not available in the black-box substitute attack, most recent schemes utilize GANs to generate data for training the substitute model. However, these GANs-based schemes suffer from low training efficiency as the generator needs to be retrained for each target model during the substitute training process, as well as low generation quality. To overcome these limitations, we consider utilizing the diffusion model to generate data, and propose a novel data-free substitute attack scheme based on the Stable Diffusion (SD) to improve the efficiency and accuracy of substitute training. Despite the data generated by the SD exhibiting high quality, it presents a different distribution of domains and a large variation of positive and negative samples for the target model. For this problem, we propose Latent Code Augmentation (LCA) to facilitate SD in generating data that aligns with the data distribution of the target model. Specifically, we augment the latent codes of the inferred member data with LCA and use them as guidance for SD. With the guidance of LCA, the data generated by the SD not only meets the discriminative criteria of the target model but also exhibits high diversity. By utilizing this data, it is possible to train the substitute model that closely resembles the target model more efficiently. Extensive experiments demonstrate that our LCA achieves higher attack success rates and requires fewer query budgets compared to GANs-based schemes for different target models. Our codes are available at \url{https://github.com/LzhMeng/LCA}.

Latent Code Augmentation Based on Stable Diffusion for Data-free Substitute Attacks

TL;DR

This work proposes latent code augmentation (LCA) to facilitate SD in generating data that aligns with the data distribution of the target model and augment the latent codes of the inferred member data with LCA and use them as guidance for SD.

Abstract

Since the training data of the target model is not available in the black-box substitute attack, most recent schemes utilize GANs to generate data for training the substitute model. However, these GANs-based schemes suffer from low training efficiency as the generator needs to be retrained for each target model during the substitute training process, as well as low generation quality. To overcome these limitations, we consider utilizing the diffusion model to generate data, and propose a novel data-free substitute attack scheme based on the Stable Diffusion (SD) to improve the efficiency and accuracy of substitute training. Despite the data generated by the SD exhibiting high quality, it presents a different distribution of domains and a large variation of positive and negative samples for the target model. For this problem, we propose Latent Code Augmentation (LCA) to facilitate SD in generating data that aligns with the data distribution of the target model. Specifically, we augment the latent codes of the inferred member data with LCA and use them as guidance for SD. With the guidance of LCA, the data generated by the SD not only meets the discriminative criteria of the target model but also exhibits high diversity. By utilizing this data, it is possible to train the substitute model that closely resembles the target model more efficiently. Extensive experiments demonstrate that our LCA achieves higher attack success rates and requires fewer query budgets compared to GANs-based schemes for different target models. Our codes are available at \url{https://github.com/LzhMeng/LCA}.
Paper Structure (11 sections, 13 equations, 3 figures)

This paper contains 11 sections, 13 equations, 3 figures.

Figures (3)

  • Figure 1: Comparison of our scheme with others. (a) Visualisation of the generated data. Includes data generated by GANs-based schemes during substitute training, Stable Diffusion prompted by class labels, and our LCA. (b) The accuracy of each class and average accuracy. The accuracy of the data generated directly with the Stable Diffusion varies widely between classes, with some classes being less than 10%, whereas the data generated by our LCA is more homogeneous and has a higher accuracy rate. (c) Attack success rate. Our LCA achieves a higher attack success rate with fewer queries. Where full data indicates substitute training with the full training set data of the target model.
  • Figure 2: The framework of our scheme, which is divided into two stages. Stage 1: Inferring member data that matches the distribution of the target model and encoding it into the codebook. Stage 2: Guiding SD to generate good images and training the substitute models. Where 'Encoder' is the image encoder 'AutoEncoderKL' in the pre-trained SD. The number of classes in the codebook is N and the length is M. $SiAug$ and $MuAug$ are single-code augmentation and multi-code augmentation functions, respectively. $\phi$ and $\psi$ are different augmentation operations. $\mathcal{L}_{Sub}$ is the loss of substitute training.
  • Figure 3: The process of generating data using Latent Code Augmentation. Our LCA augments the latent code of the image, which is then used to guide the SD in generating data. Where $SiAug$ and $MuAug$ are single-code augmentation and multi-code augmentation functions, respectively, $t$ is the number of iterations, and $\phi$ and $\psi$ are different code augmentation operations.