Distributional Black-Box Model Inversion Attack with Multi-Agent Reinforcement Learning

Huan Bao; Kaimin Wei; Yongdong Wu; Jin Qian; Robert H. Deng

Distributional Black-Box Model Inversion Attack with Multi-Agent Reinforcement Learning

Huan Bao, Kaimin Wei, Yongdong Wu, Jin Qian, Robert H. Deng

TL;DR

DBB-MI introduces a distributional black-box model inversion attack that operates without access to target-model parameters or specialized GAN training. It learns a probabilistic latent space for data reconstruction by coordinating two agents via MADDPG to optimize the latent distribution mean $\mu$ and variance $\sigma$, then samples latent codes to recover private data through a GAN trained on public data. Across CelebA, FaceScrub, Pubfig83, FFHQ, and MNIST, DBB-MI surpasses state-of-the-art white-box and black-box MI baselines in ACC, KNN Dist, and PSNR, underscoring the effectiveness of latent-distribution exploration for privacy leakage in face-recognition models. The results reveal robust privacy risks in black-box settings and demonstrate how MARL-based latent-space optimization can significantly enhance MI attacks, informing both attacker defense and privacy-preserving design.

Abstract

A Model Inversion (MI) attack based on Generative Adversarial Networks (GAN) aims to recover the private training data from complex deep learning models by searching codes in the latent space. However, they merely search a deterministic latent space such that the found latent code is usually suboptimal. In addition, the existing distributional MI schemes assume that an attacker can access the structures and parameters of the target model, which is not always viable in practice. To overcome the above shortcomings, this paper proposes a novel Distributional Black-Box Model Inversion (DBB-MI) attack by constructing the probabilistic latent space for searching the target privacy data. Specifically, DBB-MI does not need the target model parameters or specialized GAN training. Instead, it finds the latent probability distribution by combining the output of the target model with multi-agent reinforcement learning techniques. Then, it randomly chooses latent codes from the latent probability distribution for recovering the private data. As the latent probability distribution closely aligns with the target privacy data in latent space, the recovered data will leak the privacy of training samples of the target model significantly. Abundant experiments conducted on diverse datasets and networks show that the present DBB-MI has better performance than state-of-the-art in attack accuracy, K-nearest neighbor feature distance, and Peak Signal-to-Noise Ratio.

Distributional Black-Box Model Inversion Attack with Multi-Agent Reinforcement Learning

TL;DR

and variance

, then samples latent codes to recover private data through a GAN trained on public data. Across CelebA, FaceScrub, Pubfig83, FFHQ, and MNIST, DBB-MI surpasses state-of-the-art white-box and black-box MI baselines in ACC, KNN Dist, and PSNR, underscoring the effectiveness of latent-distribution exploration for privacy leakage in face-recognition models. The results reveal robust privacy risks in black-box settings and demonstrate how MARL-based latent-space optimization can significantly enhance MI attacks, informing both attacker defense and privacy-preserving design.

Abstract

Paper Structure (39 sections, 14 equations, 14 figures, 4 tables, 1 algorithm)

This paper contains 39 sections, 14 equations, 14 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Multi-Agent Reinforcement Learning
Model Inversion Attacks
The Proposed Approach: DBB-MI
Problem Formulation
Attack model
Stochastic Game for Latent Distribution Search
Overview
MADDPG for Searching Latent Space Distribution
MADDPG
Action
Reward
Distribution optimization
MADDPG for Agent Training
...and 24 more sections

Figures (14)

Figure 1: The overview of multi-agent reinforcement learning.
Figure 2: The overview of DBB-MI. To search for the target latent space distribution, two agents are employed to optimize the target distribution's $\mu$ and $\sigma$, respectively.
Figure 3: The overall collaboration and competition between two agents. The actor network makes decisions by observing the environmental state, while the critic network feeds feedback to the actor network according to its global observations.
Figure 4: The overall steps of one-dimensional latent distribution search. From random latent distribution to real latent distribution.
Figure 5: The images reconstructed by different MI attacks under CelebA and VGG16. The top row displays the real images, the middle two rows show the images reconstructed by the white-box MI baselines, and the bottom four rows exhibit the images reconstructed by the black-box MI baselines and our method DBB-MI.
...and 9 more figures

Distributional Black-Box Model Inversion Attack with Multi-Agent Reinforcement Learning

TL;DR

Abstract

Distributional Black-Box Model Inversion Attack with Multi-Agent Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (14)