Vicious Classifiers: Assessing Inference-time Data Reconstruction Risk in Edge Computing

Mohammad Malekzadeh; Deniz Gunduz

Vicious Classifiers: Assessing Inference-time Data Reconstruction Risk in Edge Computing

Mohammad Malekzadeh, Deniz Gunduz

TL;DR

This work investigates privacy risks in edge computing where a server observes a model's outputs, such as $\hat{\mathbf{y}}$ or softmax probabilities, and trains an attack model to reconstruct the user's input ${\mathbf x}$ while maintaining target-task accuracy. It formalizes a joint training framework for a target classifier $\mathcal{F}$ and an attack decoder $\mathcal{G}$, optimizing $L^{\mathcal{F}} = \beta^{C}L^{C}(\hat{\mathbf{y}}, {\mathbf y}) + \beta^{R}L^{R}(\widetilde{\mathbf x}, {\mathbf x})$ with a reconstruction loss $L^{R}$ that blends SSIM and Huber losses. The authors introduce a Mahalanobis-distance based reconstruction risk $\mathtt{R}$ to quantify privacy loss, and demonstrate across six datasets that input data can be substantially reconstructed from a single inference even when target accuracy remains high; they also propose an initial defense to distinguish vicious versus honest models at inference time and release their code. This work highlights a practical privacy risk in edge ML services and provides a principled framework and baseline defenses to guide future research on mitigating inference-time data reconstruction.

Abstract

Privacy-preserving inference in edge computing paradigms encourages the users of machine-learning services to locally run a model on their private input and only share the models outputs for a target task with the server. We study how a vicious server can reconstruct the input data by observing only the models outputs while keeping the target accuracy very close to that of a honest server by jointly training a target model (to run at users' side) and an attack model for data reconstruction (to secretly use at servers' side). We present a new measure to assess the inference-time reconstruction risk. Evaluations on six benchmark datasets show the model's input can be approximately reconstructed from the outputs of a single inference. We propose a primary defense mechanism to distinguish vicious versus honest classifiers at inference time. By studying such a risk associated with emerging ML services our work has implications for enhancing privacy in edge computing. We discuss open challenges and directions for future studies and release our code as a benchmark for the community at https://github.com/mmalekzadeh/vicious-classifiers .

Vicious Classifiers: Assessing Inference-time Data Reconstruction Risk in Edge Computing

TL;DR

This work investigates privacy risks in edge computing where a server observes a model's outputs, such as

or softmax probabilities, and trains an attack model to reconstruct the user's input

while maintaining target-task accuracy. It formalizes a joint training framework for a target classifier

and an attack decoder

, optimizing

with a reconstruction loss

that blends SSIM and Huber losses. The authors introduce a Mahalanobis-distance based reconstruction risk

to quantify privacy loss, and demonstrate across six datasets that input data can be substantially reconstructed from a single inference even when target accuracy remains high; they also propose an initial defense to distinguish vicious versus honest models at inference time and release their code. This work highlights a practical privacy risk in edge ML services and provides a principled framework and baseline defenses to guide future research on mitigating inference-time data reconstruction.

Abstract

Paper Structure (20 sections, 20 equations, 6 figures, 8 tables, 1 algorithm)

This paper contains 20 sections, 20 equations, 6 figures, 8 tables, 1 algorithm.

Introduction
Methodology
Reconstruction Risk
Experimental Results
Discussion
Conclusion
Threat Model
Related Work
An Information-Theoretical View
Algorithm
Experimental Setup
Datasets
Models and Training
Settings
Additional Results
...and 5 more sections

Figures (6)

Figure 1: Processing user's input ${\mathbf x}$, the server receives only the output $\hat{{\mathbf{y}}}\!=\!\mathcal{F}({\mathbf x})$. We show that for any model $\mathcal{F}$, the server can train an attack model $\mathcal{G}$ to secretly reconstruct the input from observed output, while providing the target service to the user with high accuracy.
Figure 2: $\mathcal{F}$ is the target model, $\mathcal{G}$ is the attack model, $L^C$ is the classification loss, $L^R$ is the attack reconstruction loss. Both $\mathcal{F}$ and $\mathcal{G}$ are DNNs. Hyperparameters $\beta^{C}$ and $\beta^{R}$ control the trade-offs between classification and reconstruction tasks.
Figure 3: Examples of image reconstruction with the logit outputs for $\beta^R/\beta^C= 3/1$ in Table \ref{['tab_results_main']} for MNIST, FMNIST, CIFAR10, CIFAR100, TinyImageNet, and CelebA datasets (from top to bottom). For each dataset, the first row consists of the original images and the second row is the reconstructed data by the attacker.
Figure 4: More qualitative examples of the same experimental settings presented in Figure \ref{['fig_qualitative_1']}.
Figure 5: A vicious model, trained to perform a secret task, is expected to deviate more from the optimal solution for the target task, compared to an honest model.
...and 1 more figures

Vicious Classifiers: Assessing Inference-time Data Reconstruction Risk in Edge Computing

TL;DR

Abstract

Vicious Classifiers: Assessing Inference-time Data Reconstruction Risk in Edge Computing

Authors

TL;DR

Abstract

Table of Contents

Figures (6)