Table of Contents
Fetching ...

Model Inversion Attacks Through Target-Specific Conditional Diffusion Models

Ouxiang Li, Yanbin Hao, Zhicai Wang, Bin Zhu, Shuo Wang, Zaixi Zhang, Fuli Feng

TL;DR

This work introduces a novel target-specific conditional diffusion model (CDM) to purposely approximate target classifier's private distribution and achieve superior accuracy-fidelity balance, and proposes an improved max-margin loss that replaces the hard max with top-k maxes, fully leveraging feature information and soft labels from the target classifier.

Abstract

Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications. Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space. To alleviate these issues, leveraging on diffusion models' remarkable synthesis capabilities, we propose Diffusion-based Model Inversion (Diff-MI) attacks. Specifically, we introduce a novel target-specific conditional diffusion model (CDM) to purposely approximate target classifier's private distribution and achieve superior accuracy-fidelity balance. Our method involves a two-step learning paradigm. Step-1 incorporates the target classifier into the entire CDM learning under a pretrain-then-finetune fashion, with creating pseudo-labels as model conditions in pretraining and adjusting specified layers with image predictions in fine-tuning. Step-2 presents an iterative image reconstruction method, further enhancing the attack performance through a combination of diffusion priors and target knowledge. Additionally, we propose an improved max-margin loss that replaces the hard max with top-k maxes, fully leveraging feature information and soft labels from the target classifier. Extensive experiments demonstrate that Diff-MI significantly improves generative fidelity with an average decrease of 20\% in FID while maintaining competitive attack accuracy compared to state-of-the-art methods across various datasets and models. Our code is available at: \url{https://github.com/Ouxiang-Li/Diff-MI}.

Model Inversion Attacks Through Target-Specific Conditional Diffusion Models

TL;DR

This work introduces a novel target-specific conditional diffusion model (CDM) to purposely approximate target classifier's private distribution and achieve superior accuracy-fidelity balance, and proposes an improved max-margin loss that replaces the hard max with top-k maxes, fully leveraging feature information and soft labels from the target classifier.

Abstract

Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications. Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space. To alleviate these issues, leveraging on diffusion models' remarkable synthesis capabilities, we propose Diffusion-based Model Inversion (Diff-MI) attacks. Specifically, we introduce a novel target-specific conditional diffusion model (CDM) to purposely approximate target classifier's private distribution and achieve superior accuracy-fidelity balance. Our method involves a two-step learning paradigm. Step-1 incorporates the target classifier into the entire CDM learning under a pretrain-then-finetune fashion, with creating pseudo-labels as model conditions in pretraining and adjusting specified layers with image predictions in fine-tuning. Step-2 presents an iterative image reconstruction method, further enhancing the attack performance through a combination of diffusion priors and target knowledge. Additionally, we propose an improved max-margin loss that replaces the hard max with top-k maxes, fully leveraging feature information and soft labels from the target classifier. Extensive experiments demonstrate that Diff-MI significantly improves generative fidelity with an average decrease of 20\% in FID while maintaining competitive attack accuracy compared to state-of-the-art methods across various datasets and models. Our code is available at: \url{https://github.com/Ouxiang-Li/Diff-MI}.
Paper Structure (14 sections, 9 equations, 6 figures, 10 tables)

This paper contains 14 sections, 9 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: Fidelity degradation. We randomly select 4 target classes and visualize their distributional variation in GAN's latent space along with corresponding generated images, depicted with different colors (MIA Method = PLG-MI yuan2023pseudo, Private Dataset = CelebA liu2015deep, Public Dataset = CelebA, Target Classifier = Face.evoLVe cheng2017know). Through iterations, we observe that the latent variables, initially sampled from $\mathcal{N}(\mathbf{0}, \mathbf{I})$, tend to cluster together and constitute a new distribution with a deviation from $\mathcal{N}(\mathbf{0}, \mathbf{I})$. Meanwhile, reconstructed images gradually deteriorate visually during inversion because of the widening gap between the optimized latent distribution and the prior distribution. As a result, PLG-MI sacrifices generative fidelity for attack accuracy, which is also present in other GAN-based methods zhang2020secretchen2021knowledge.
  • Figure 2: Overview of our proposed two-step Diff-MI attacks. Step-1: We build a target-specific CDM on the public dataset to distill the target classifier's knowledge. This is achieved by pretraining a CDM with pseudo-labels produced by the target classifier as conditions, then fine-tuning a small subset of the pretrained CDM with the guidance of the target classifier (subscripts of layers in the middle block indicate their corresponding index). Step-2: We use an iterative image reconstruction method to involve both diffusion prior (i.e., $\mathcal{L}_{\text{prior}}$) and target classifier's knowledge (i.e., $\mathcal{L}_{\text{cls}}$). $\mathbf{x}_{T}$ in step-1 and $\mathbf{x}_{0}$ in step-2 are both initialized from $\mathcal{N}(\mathbf{0}, \mathbf{I})$.
  • Figure 3: Visual comparison of reconstructed images using different MIA methods ($\mathcal{D}_{\text{pri}}$ = CelebA, $\mathcal{D}_{\text{pub}}$ = CelebA, Target Classifier = VGG16). The first row shows the ground-truth private images of target labels of different identities.
  • Figure 4: Visual comparison of reconstructed images labeled from "001. Black-footed Albatross" to "015. Lazuli Bunting" between PLG-MI and our Diff-MI on CUB-200-2011.
  • Figure 5: Visual comparison of reconstructed chest X-rays labeled from "0" to "5" using different MIA methods ($\mathcal{D}_{\text{pri}}$ = ChestX-Ray, $\mathcal{D}_{\text{pub}}$ = CheXpert).
  • ...and 1 more figures