Table of Contents
Fetching ...

PPO-MI: Efficient Black-Box Model Inversion via Proximal Policy Optimization

Xinpeng Shou

TL;DR

This work tackles privacy risks from black-box model inversion attacks on face recognition systems by showing attackers can reconstruct private training samples using only model predictions. It proposes PPO-MI, which frames inversion as a sequential decision problem over the latent space of a pretrained generator and optimizes a PPO-based policy with a momentum-based state transition and a reward that balances classification success and exploration, i.e., $z^*=\,\operatorname{argmax}_{z} P(y|T(G(z)))$. Key contributions include a formal MDP formulation with continuous state/action spaces, a momentum-based transition mechanism, and a targeted reward design that yields high attack success with fewer queries and training classes, achieving up to 79.7% accuracy with only 20K queries across CelebA, PubFig, and FaceScrub. These findings underscore practical privacy vulnerabilities in deployed models and motivate defense research against black-box inversion.

Abstract

Model inversion attacks pose a significant privacy risk by attempting to reconstruct private training data from trained models. Most of the existing methods either depend on gradient estimation or require white-box access to model parameters, which limits their applicability in practical scenarios. In this paper, we propose PPO-MI, a novel reinforcement learning-based framework for black-box model inversion attacks. Our approach formulates the inversion task as a Markov Decision Process, where an agent navigates the latent space of a generative model to reconstruct private training samples using only model predictions. By employing Proximal Policy Optimization (PPO) with a momentum-based state transition mechanism, along with a reward function balancing prediction accuracy and exploration, PPO-MI ensures efficient latent space exploration and high query efficiency. We conduct extensive experiments illustrates that PPO-MI outperforms the existing methods while require less attack knowledge, and it is robust across various model architectures and datasets. These results underline its effectiveness and generalizability in practical black-box scenarios, raising important considerations for the privacy vulnerabilities of deployed machine learning models.

PPO-MI: Efficient Black-Box Model Inversion via Proximal Policy Optimization

TL;DR

This work tackles privacy risks from black-box model inversion attacks on face recognition systems by showing attackers can reconstruct private training samples using only model predictions. It proposes PPO-MI, which frames inversion as a sequential decision problem over the latent space of a pretrained generator and optimizes a PPO-based policy with a momentum-based state transition and a reward that balances classification success and exploration, i.e., . Key contributions include a formal MDP formulation with continuous state/action spaces, a momentum-based transition mechanism, and a targeted reward design that yields high attack success with fewer queries and training classes, achieving up to 79.7% accuracy with only 20K queries across CelebA, PubFig, and FaceScrub. These findings underscore practical privacy vulnerabilities in deployed models and motivate defense research against black-box inversion.

Abstract

Model inversion attacks pose a significant privacy risk by attempting to reconstruct private training data from trained models. Most of the existing methods either depend on gradient estimation or require white-box access to model parameters, which limits their applicability in practical scenarios. In this paper, we propose PPO-MI, a novel reinforcement learning-based framework for black-box model inversion attacks. Our approach formulates the inversion task as a Markov Decision Process, where an agent navigates the latent space of a generative model to reconstruct private training samples using only model predictions. By employing Proximal Policy Optimization (PPO) with a momentum-based state transition mechanism, along with a reward function balancing prediction accuracy and exploration, PPO-MI ensures efficient latent space exploration and high query efficiency. We conduct extensive experiments illustrates that PPO-MI outperforms the existing methods while require less attack knowledge, and it is robust across various model architectures and datasets. These results underline its effectiveness and generalizability in practical black-box scenarios, raising important considerations for the privacy vulnerabilities of deployed machine learning models.

Paper Structure

This paper contains 15 sections, 4 equations, 5 tables, 1 algorithm.