Distributional Black-Box Model Inversion Attack with Multi-Agent Reinforcement Learning
Huan Bao, Kaimin Wei, Yongdong Wu, Jin Qian, Robert H. Deng
TL;DR
DBB-MI introduces a distributional black-box model inversion attack that operates without access to target-model parameters or specialized GAN training. It learns a probabilistic latent space for data reconstruction by coordinating two agents via MADDPG to optimize the latent distribution mean $\mu$ and variance $\sigma$, then samples latent codes to recover private data through a GAN trained on public data. Across CelebA, FaceScrub, Pubfig83, FFHQ, and MNIST, DBB-MI surpasses state-of-the-art white-box and black-box MI baselines in ACC, KNN Dist, and PSNR, underscoring the effectiveness of latent-distribution exploration for privacy leakage in face-recognition models. The results reveal robust privacy risks in black-box settings and demonstrate how MARL-based latent-space optimization can significantly enhance MI attacks, informing both attacker defense and privacy-preserving design.
Abstract
A Model Inversion (MI) attack based on Generative Adversarial Networks (GAN) aims to recover the private training data from complex deep learning models by searching codes in the latent space. However, they merely search a deterministic latent space such that the found latent code is usually suboptimal. In addition, the existing distributional MI schemes assume that an attacker can access the structures and parameters of the target model, which is not always viable in practice. To overcome the above shortcomings, this paper proposes a novel Distributional Black-Box Model Inversion (DBB-MI) attack by constructing the probabilistic latent space for searching the target privacy data. Specifically, DBB-MI does not need the target model parameters or specialized GAN training. Instead, it finds the latent probability distribution by combining the output of the target model with multi-agent reinforcement learning techniques. Then, it randomly chooses latent codes from the latent probability distribution for recovering the private data. As the latent probability distribution closely aligns with the target privacy data in latent space, the recovered data will leak the privacy of training samples of the target model significantly. Abundant experiments conducted on diverse datasets and networks show that the present DBB-MI has better performance than state-of-the-art in attack accuracy, K-nearest neighbor feature distance, and Peak Signal-to-Noise Ratio.
