Table of Contents
Fetching ...

Securing Transfer-Learned Networks with Reverse Homomorphic Encryption

Robert Allison, Tomasz Maciążek, Henry Bourne

TL;DR

The paper addresses training-data privacy in neural networks trained on sensitive data, showing that differential privacy via DP-SGD can fail to defend against few-shot transfer-learning reconstruction attacks without substantial utility loss. It introduces Reverse Homomorphic Encryption (RHE), which encrypts the transfer-learned head while keeping inputs and the base model unencrypted, thereby protecting training data against white-box and black-box reconstructions while preserving accuracy and enabling practical inference. The authors develop powerful white-box and hard-label black-box attacks under a realistic threat model and evaluate them with a Neyman–Pearson ROC framework, demonstrating strong reconstruction capabilities under DP-SGD. RHE is shown to be a practical, minimal-encryption defense that blocks reconstruction, membership, and property-inference attacks with competitive inference times on standard hardware. These results underscore the importance of data-centric defenses in TL and point to RHE as a viable privacy-preserving regime for few-shot TL deployments.

Abstract

The growing body of literature on training-data reconstruction attacks raises significant concerns about deploying neural network classifiers trained on sensitive data. However, differentially private (DP) training (e.g. using DP-SGD) can defend against such attacks with large training datasets causing only minimal loss of network utility. Folklore, heuristics, and (albeit pessimistic) DP bounds suggest this fails for networks trained with small per-class datasets, yet to the best of our knowledge the literature offers no compelling evidence. We directly demonstrate this vulnerability by significantly extending reconstruction attack capabilities under a realistic adversary threat model for few-shot transfer learned image classifiers. We design new white-box and black-box attacks and find that DP-SGD is unable to defend against these without significant classifier utility loss. To address this, we propose a novel homomorphic encryption (HE) method that protects training data without degrading model's accuracy. Conventional HE secures model's input data and requires costly homomorphic implementation of the entire classifier. In contrast, our new scheme is computationally efficient and protects training data rather than input data. This is achieved by means of a simple role-reversal where classifier input data is unencrypted but transfer-learned weights are encrypted. Classifier outputs remain encrypted, thus preventing both white-box and black-box (and any other) training-data reconstruction attacks. Under this new scheme only a trusted party with a private decryption key can obtain the classifier class decisions.

Securing Transfer-Learned Networks with Reverse Homomorphic Encryption

TL;DR

The paper addresses training-data privacy in neural networks trained on sensitive data, showing that differential privacy via DP-SGD can fail to defend against few-shot transfer-learning reconstruction attacks without substantial utility loss. It introduces Reverse Homomorphic Encryption (RHE), which encrypts the transfer-learned head while keeping inputs and the base model unencrypted, thereby protecting training data against white-box and black-box reconstructions while preserving accuracy and enabling practical inference. The authors develop powerful white-box and hard-label black-box attacks under a realistic threat model and evaluate them with a Neyman–Pearson ROC framework, demonstrating strong reconstruction capabilities under DP-SGD. RHE is shown to be a practical, minimal-encryption defense that blocks reconstruction, membership, and property-inference attacks with competitive inference times on standard hardware. These results underscore the importance of data-centric defenses in TL and point to RHE as a viable privacy-preserving regime for few-shot TL deployments.

Abstract

The growing body of literature on training-data reconstruction attacks raises significant concerns about deploying neural network classifiers trained on sensitive data. However, differentially private (DP) training (e.g. using DP-SGD) can defend against such attacks with large training datasets causing only minimal loss of network utility. Folklore, heuristics, and (albeit pessimistic) DP bounds suggest this fails for networks trained with small per-class datasets, yet to the best of our knowledge the literature offers no compelling evidence. We directly demonstrate this vulnerability by significantly extending reconstruction attack capabilities under a realistic adversary threat model for few-shot transfer learned image classifiers. We design new white-box and black-box attacks and find that DP-SGD is unable to defend against these without significant classifier utility loss. To address this, we propose a novel homomorphic encryption (HE) method that protects training data without degrading model's accuracy. Conventional HE secures model's input data and requires costly homomorphic implementation of the entire classifier. In contrast, our new scheme is computationally efficient and protects training data rather than input data. This is achieved by means of a simple role-reversal where classifier input data is unencrypted but transfer-learned weights are encrypted. Classifier outputs remain encrypted, thus preventing both white-box and black-box (and any other) training-data reconstruction attacks. Under this new scheme only a trusted party with a private decryption key can obtain the classifier class decisions.

Paper Structure

This paper contains 28 sections, 3 theorems, 41 equations, 19 figures, 12 tables, 3 algorithms.

Key Result

Theorem 1

Assume that there exists a strictly increasing differentiable transformation $\Phi$ such that $\Phi(\ell_{\min})\sim\mathcal{N}\left(\mu_0,\sigma_0^2\right)$ if $\ell_{\min}\sim \rho_0$ and $\Phi(\ell_{\min})\sim\mathcal{N}\left(\mu_1,\sigma_1^2\right)$ if $\ell_{\min}\sim \rho_1$ with $\mu_0 < \mu_

Figures (19)

  • Figure 1: Reconstructions obtained via our attack with CIFAR-10 cifar, MNIST mnist (reconstructed in the $32\times 32$-resolution) and CelebA celebA (reconstructed in the $64\times 64$-resolution). The reconstruction "success" is determined via the Neyman-Pearson criterion at reconstruction $FPR=1\%$, see Section \ref{['sec:rero']}. TL training set size $N=10$.
  • Figure 2: Privacy-utility tradeoff for the MNIST and CIFAR-10 experiments: a) white-box attack, b) black-box attack. All models suffer severe accuracy degradation even for relatively large values of $\epsilon$.
  • Figure 3: Reconstruction example for CIFAR-10 and $(\epsilon,\delta)$-DP models for training set size $N=10$. Non-private training means $\epsilon=\infty$.
  • Figure 4: (a) Shows conventional HE usage in the context of performing TL neural network predictions. (b) Shows how RHE is implemented.
  • Figure 5: Figure detailing steps for configuring RHE with TenSEAL.
  • ...and 14 more figures

Theorems & Definitions (5)

  • Theorem 1
  • Lemma 1
  • proof
  • Theorem 2
  • proof