Table of Contents
Fetching ...

Unstoppable Attack: Label-Only Model Inversion via Conditional Diffusion Model

Rongke Liu, Dong Wang, Yizhi Ren, Zhen Wang, Kaitian Guo, Qianqian Qin, Xiaolei Liu

TL;DR

This work tackles privacy risks from model inversion attacks in label-only settings, where attackers only receive predicted labels. It introduces a conditional diffusion model guided by target-labels that is trained on an auxiliary dataset and then used to recover multiple target-label samples without gradient access. Through gamma correction, random transformations, and top-k filtering, the approach achieves higher perceptual similarity and realism (as measured by LPIPS) and better target-accuracy than prior methods. These results highlight a practical vulnerability in label-only access scenarios and motivate both defense development and further efficiency improvements.

Abstract

Model inversion attacks (MIAs) aim to recover private data from inaccessible training sets of deep learning models, posing a privacy threat. MIAs primarily focus on the white-box scenario where attackers have full access to the model's structure and parameters. However, practical applications are usually in black-box scenarios or label-only scenarios, i.e., the attackers can only obtain the output confidence vectors or labels by accessing the model. Therefore, the attack models in existing MIAs are difficult to effectively train with the knowledge of the target model, resulting in sub-optimal attacks. To the best of our knowledge, we pioneer the research of a powerful and practical attack model in the label-only scenario. In this paper, we develop a novel MIA method, leveraging a conditional diffusion model (CDM) to recover representative samples under the target label from the training set. Two techniques are introduced: selecting an auxiliary dataset relevant to the target model task and using predicted labels as conditions to guide training CDM; and inputting target label, pre-defined guidance strength, and random noise into the trained attack model to generate and correct multiple results for final selection. This method is evaluated using Learned Perceptual Image Patch Similarity as a new metric and as a judgment basis for deciding the values of hyper-parameters. Experimental results show that this method can generate similar and accurate samples to the target label, outperforming generators of previous approaches.

Unstoppable Attack: Label-Only Model Inversion via Conditional Diffusion Model

TL;DR

This work tackles privacy risks from model inversion attacks in label-only settings, where attackers only receive predicted labels. It introduces a conditional diffusion model guided by target-labels that is trained on an auxiliary dataset and then used to recover multiple target-label samples without gradient access. Through gamma correction, random transformations, and top-k filtering, the approach achieves higher perceptual similarity and realism (as measured by LPIPS) and better target-accuracy than prior methods. These results highlight a practical vulnerability in label-only access scenarios and motivate both defense development and further efficiency improvements.

Abstract

Model inversion attacks (MIAs) aim to recover private data from inaccessible training sets of deep learning models, posing a privacy threat. MIAs primarily focus on the white-box scenario where attackers have full access to the model's structure and parameters. However, practical applications are usually in black-box scenarios or label-only scenarios, i.e., the attackers can only obtain the output confidence vectors or labels by accessing the model. Therefore, the attack models in existing MIAs are difficult to effectively train with the knowledge of the target model, resulting in sub-optimal attacks. To the best of our knowledge, we pioneer the research of a powerful and practical attack model in the label-only scenario. In this paper, we develop a novel MIA method, leveraging a conditional diffusion model (CDM) to recover representative samples under the target label from the training set. Two techniques are introduced: selecting an auxiliary dataset relevant to the target model task and using predicted labels as conditions to guide training CDM; and inputting target label, pre-defined guidance strength, and random noise into the trained attack model to generate and correct multiple results for final selection. This method is evaluated using Learned Perceptual Image Patch Similarity as a new metric and as a judgment basis for deciding the values of hyper-parameters. Experimental results show that this method can generate similar and accurate samples to the target label, outperforming generators of previous approaches.
Paper Structure (34 sections, 7 equations, 9 figures, 8 tables)

This paper contains 34 sections, 7 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Training class inference of our and previous approaches against a facial recognition classifier in the label-only scenario. For better comparison with our method, we turned the labels into correct one-hot vectors to train (b)'s attack modelyang2019neural for recovering optimal color images. Note that the correct one-hot vector implies that the confidence value at the target position is 1, while the rest are 0. For instance, if there are a total of 3 classes and the target is class 1, then the one-hot vector would be (1,0,0). Moreover, the auxiliary dataset used by both is the same.
  • Figure 2: The attack overview of the proposed label-only model inversion attack method.
  • Figure 3: The first step in the recovery phase is to input noise, target labels, and guidance strength $\omega$ to the trained diffusion model and denoise them step by step to obtain the generated image, and eventually correct it.
  • Figure 4: The impact of variations in the target model’s precision rate for classifying different individuals on weight filtering is depicted. (a) represents the true images of an individual, with the target model’s test precision rate for each individual. (b) represents the two most optimal results after filtering according to the weights $\mathbb{E}[\delta (F_W(T(G_\theta(\mathbf{z})^{\gamma})),l)]$.
  • Figure 5: Qualitative evaluation and attack performance comparison on various methods under whether FaceScrub and CelebA datasets overlap or not.
  • ...and 4 more figures