Vicious Classifiers: Assessing Inference-time Data Reconstruction Risk in Edge Computing
Mohammad Malekzadeh, Deniz Gunduz
TL;DR
This work investigates privacy risks in edge computing where a server observes a model's outputs, such as $\hat{\mathbf{y}}$ or softmax probabilities, and trains an attack model to reconstruct the user's input ${\mathbf x}$ while maintaining target-task accuracy. It formalizes a joint training framework for a target classifier $\mathcal{F}$ and an attack decoder $\mathcal{G}$, optimizing $L^{\mathcal{F}} = \beta^{C}L^{C}(\hat{\mathbf{y}}, {\mathbf y}) + \beta^{R}L^{R}(\widetilde{\mathbf x}, {\mathbf x})$ with a reconstruction loss $L^{R}$ that blends SSIM and Huber losses. The authors introduce a Mahalanobis-distance based reconstruction risk $\mathtt{R}$ to quantify privacy loss, and demonstrate across six datasets that input data can be substantially reconstructed from a single inference even when target accuracy remains high; they also propose an initial defense to distinguish vicious versus honest models at inference time and release their code. This work highlights a practical privacy risk in edge ML services and provides a principled framework and baseline defenses to guide future research on mitigating inference-time data reconstruction.
Abstract
Privacy-preserving inference in edge computing paradigms encourages the users of machine-learning services to locally run a model on their private input and only share the models outputs for a target task with the server. We study how a vicious server can reconstruct the input data by observing only the models outputs while keeping the target accuracy very close to that of a honest server by jointly training a target model (to run at users' side) and an attack model for data reconstruction (to secretly use at servers' side). We present a new measure to assess the inference-time reconstruction risk. Evaluations on six benchmark datasets show the model's input can be approximately reconstructed from the outputs of a single inference. We propose a primary defense mechanism to distinguish vicious versus honest classifiers at inference time. By studying such a risk associated with emerging ML services our work has implications for enhancing privacy in edge computing. We discuss open challenges and directions for future studies and release our code as a benchmark for the community at https://github.com/mmalekzadeh/vicious-classifiers .
