Breaking the Black-Box: Confidence-Guided Model Inversion Attack for Distribution Shift

Xinhao Liu; Yingzhao Jiang; Zetao Lin

Breaking the Black-Box: Confidence-Guided Model Inversion Attack for Distribution Shift

Xinhao Liu, Yingzhao Jiang, Zetao Lin

TL;DR

The paper addresses the challenge of performing high-resolution model inversion attacks in black-box settings under distribution shift. It proposes Confidence-Guided MIA (CG-MI), which uses a pre-trained StyleGAN2 image prior and a mapping network to constrain optimization in the latent space, coupled with a gradient-free CMA-ES optimizer to minimize a Confidence Matching Loss. CG-MI achieves state-of-the-art performance on black-box MIAs across diverse data distributions, with results approaching white-box attacks in image quality and transferability, and demonstrates robust performance under distribution shifts. This method provides a practical and effective framework for black-box MIAs, highlighting privacy risks and the need for defenses in real-world deployments.

Abstract

Model inversion attacks (MIAs) seek to infer the private training data of a target classifier by generating synthetic images that reflect the characteristics of the target class through querying the model. However, prior studies have relied on full access to the target model, which is not practical in real-world scenarios. Additionally, existing black-box MIAs assume that the image prior and target model follow the same distribution. However, when confronted with diverse data distribution settings, these methods may result in suboptimal performance in conducting attacks. To address these limitations, this paper proposes a \textbf{C}onfidence-\textbf{G}uided \textbf{M}odel \textbf{I}nversion attack method called CG-MI, which utilizes the latent space of a pre-trained publicly available generative adversarial network (GAN) as prior information and gradient-free optimizer, enabling high-resolution MIAs across different data distributions in a black-box setting. Our experiments demonstrate that our method significantly \textbf{outperforms the SOTA black-box MIA by more than 49\% for Celeba and 58\% for Facescrub in different distribution settings}. Furthermore, our method exhibits the ability to generate high-quality images \textbf{comparable to those produced by white-box attacks}. Our method provides a practical and effective solution for black-box model inversion attacks.

Breaking the Black-Box: Confidence-Guided Model Inversion Attack for Distribution Shift

TL;DR

Abstract

Paper Structure (16 sections, 8 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 16 sections, 8 equations, 6 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Threat Model
Methodology
Background
Breaking the Black-Box
Experiments
Experimental Settings
Experimental Results
Ablation Study
Discussion, Limitations and Conclusion
Confidence Matching Loss
Experimental Supplement
Datasets
Publicly Available Image Prior
...and 1 more sections

Figures (6)

Figure 1: Illustration of private training data leakage for a specific target class $c$ via the target model output confidence: ① Adversary inputs the initially generated image into the target model. ② Adversary obtains the model output confidence $y$. ③ Adversary attempts to reconstruct the private training data from the confidence $y$ to $\hat{y}_c$ in a black-box setting, where $\hat{y}_c$ is the one-hot vector for class $c$. The existing method is the optimization result in $W$ space.
Figure 2: The overview of the proposed attack. Latent vectors z are sampled from a standard normal distribution $N(0,1)$. These latent vectors are then passed through a mapping network to obtain style vectors w. The style vectors w are subsequently fed into a synthesis network to generate corresponding images. These generated images are further inputted into the target model, and the loss is calculated based on the objective function. The latent vectors z are updated using a gradient-free optimization method. This process continues until we obtain optimized synthesized images.
Figure 3: In the comparison between lack synthesis image transferability and high synthesis image transferability in MIAs targeting the same identity. The first image in the second column represents an attackhan2023reinforcement generated using DCGAN, while the second image in the second column represents an attack result achieved by combining a objective function proposed by PPAstruppek2022ppa with gradient-free optimization. The third column displays attack results generated by combining our proposed objective function with gradient-free optimization algorithm. The scores inside the pictures represent the confidence scores provided by the evaluation model.
Figure 4: We present a visual comparison of the attack results for different methods in the scenario where the $P(X_{prior})$ = FFHQ, $P(X_{target})$ = CelebA and the target model architecture is Resnet18. The first row shows ground truth images of target class. The second row represents PPAstruppek2022ppa, the third row represents RLB-MIhan2023reinforcement, and the fourth row represents Brep-MIlabelonly. The last row introduces our proposed method, CG-MI.
Figure 5: We have visualized the CG-MI attack results on the Densenet169 network architecture for the CelebA, Facescrub, and Stanford Dogs datasets.
...and 1 more figures

Breaking the Black-Box: Confidence-Guided Model Inversion Attack for Distribution Shift

TL;DR

Abstract

Breaking the Black-Box: Confidence-Guided Model Inversion Attack for Distribution Shift

Authors

TL;DR

Abstract

Table of Contents

Figures (6)