Table of Contents
Fetching ...

Prediction Exposes Your Face: Black-box Model Inversion via Prediction Alignment

Yufan Liu, Wanqian Zhang, Dayan Wu, Zheng Lin, Jingzi Gu, Weiping Wang

TL;DR

This paper tackles black-box model inversion for face data by shifting from costly optimization in input space to training-based inversion. It introduces the Prediction-to-Image (P2I) framework, combining a Prediction Alignment Encoder with a fixed StyleGAN generator to map prediction vectors into the disentangled $\mathcal{W}^+$ latent space, enabling one-pass reconstruction of target identities. A key innovation is the aligned ensemble attack, which aggregates latent codes from multiple public images to capture complementary facial attributes, boosting reconstruction quality with far fewer queries than prior methods. Across multiple datasets and target-model architectures, P2I achieves higher attack accuracy and perceptual quality while dramatically reducing query counts, demonstrating a practical and potent privacy risk in black-box settings.

Abstract

Model inversion (MI) attack reconstructs the private training data of a target model given its output, posing a significant threat to deep learning models and data privacy. On one hand, most of existing MI methods focus on searching for latent codes to represent the target identity, yet this iterative optimization-based scheme consumes a huge number of queries to the target model, making it unrealistic especially in black-box scenario. On the other hand, some training-based methods launch an attack through a single forward inference, whereas failing to directly learn high-level mappings from prediction vectors to images. Addressing these limitations, we propose a novel Prediction-to-Image (P2I) method for black-box MI attack. Specifically, we introduce the Prediction Alignment Encoder to map the target model's output prediction into the latent code of StyleGAN. In this way, prediction vector space can be well aligned with the more disentangled latent space, thus establishing a connection between prediction vectors and the semantic facial features. During the attack phase, we further design the Aligned Ensemble Attack scheme to integrate complementary facial attributes of target identity for better reconstruction. Experimental results show that our method outperforms other SOTAs, e.g.,compared with RLB-MI, our method improves attack accuracy by 8.5% and reduces query numbers by 99% on dataset CelebA.

Prediction Exposes Your Face: Black-box Model Inversion via Prediction Alignment

TL;DR

This paper tackles black-box model inversion for face data by shifting from costly optimization in input space to training-based inversion. It introduces the Prediction-to-Image (P2I) framework, combining a Prediction Alignment Encoder with a fixed StyleGAN generator to map prediction vectors into the disentangled latent space, enabling one-pass reconstruction of target identities. A key innovation is the aligned ensemble attack, which aggregates latent codes from multiple public images to capture complementary facial attributes, boosting reconstruction quality with far fewer queries than prior methods. Across multiple datasets and target-model architectures, P2I achieves higher attack accuracy and perceptual quality while dramatically reducing query counts, demonstrating a practical and potent privacy risk in black-box settings.

Abstract

Model inversion (MI) attack reconstructs the private training data of a target model given its output, posing a significant threat to deep learning models and data privacy. On one hand, most of existing MI methods focus on searching for latent codes to represent the target identity, yet this iterative optimization-based scheme consumes a huge number of queries to the target model, making it unrealistic especially in black-box scenario. On the other hand, some training-based methods launch an attack through a single forward inference, whereas failing to directly learn high-level mappings from prediction vectors to images. Addressing these limitations, we propose a novel Prediction-to-Image (P2I) method for black-box MI attack. Specifically, we introduce the Prediction Alignment Encoder to map the target model's output prediction into the latent code of StyleGAN. In this way, prediction vector space can be well aligned with the more disentangled latent space, thus establishing a connection between prediction vectors and the semantic facial features. During the attack phase, we further design the Aligned Ensemble Attack scheme to integrate complementary facial attributes of target identity for better reconstruction. Experimental results show that our method outperforms other SOTAs, e.g.,compared with RLB-MI, our method improves attack accuracy by 8.5% and reduces query numbers by 99% on dataset CelebA.
Paper Structure (21 sections, 9 equations, 10 figures, 12 tables)

This paper contains 21 sections, 9 equations, 10 figures, 12 tables.

Figures (10)

  • Figure 1: Previous optimization-based methods iteratively update latent vector z within a fixed prior generator, involving enormous query numbers to target model. Differently, our method works in a training-based manner, optimizing a prediction-to-image inversion model and reconstructing face images through a simple forward inference.
  • Figure 2: Overall pipeline of P2I method. We first form training data by selecting top-$n$ public images with highest confidence for each identity. The Prediction Alignment Encoder (PAE) maps prediction vectors into the latent code of disentangled $\mathcal{W}^+$ space, which are then fed into the fixed StyleGAN's generator to reconstruct high-fidelity target image. Furthermore, we introduce aligned ensemble attack to integrate different $w$, which essentially aims to find the centroid $w_{ens}$ and make it closer to the target identity's $w_{id}$, contributing to better attack performance.
  • Figure 3: Visualizations of the interpolation on prediction vectors along the target dimension. As prediction value increases, reconstructed image gradually approaches target visual appearance. This is consistent with the decreasing normalized distance $Dist_w$ between the latent codes of target and reconstructed image. Besides, results on facial attribute classifications and identity recognition (especially the zoom-in parts of mouth and eye) also justify the prediction-$\mathcal{W}^+$-image alignment.
  • Figure 4: (a)-(c) show the result comparison of input predictions under single/ensembled and private/public settings. (d) shows the sensitivity of hyper-parameter m.
  • Figure 5: Visual comparison of different model inversion attacks.
  • ...and 5 more figures