Securing Transfer-Learned Networks with Reverse Homomorphic Encryption
Robert Allison, Tomasz Maciążek, Henry Bourne
TL;DR
The paper addresses training-data privacy in neural networks trained on sensitive data, showing that differential privacy via DP-SGD can fail to defend against few-shot transfer-learning reconstruction attacks without substantial utility loss. It introduces Reverse Homomorphic Encryption (RHE), which encrypts the transfer-learned head while keeping inputs and the base model unencrypted, thereby protecting training data against white-box and black-box reconstructions while preserving accuracy and enabling practical inference. The authors develop powerful white-box and hard-label black-box attacks under a realistic threat model and evaluate them with a Neyman–Pearson ROC framework, demonstrating strong reconstruction capabilities under DP-SGD. RHE is shown to be a practical, minimal-encryption defense that blocks reconstruction, membership, and property-inference attacks with competitive inference times on standard hardware. These results underscore the importance of data-centric defenses in TL and point to RHE as a viable privacy-preserving regime for few-shot TL deployments.
Abstract
The growing body of literature on training-data reconstruction attacks raises significant concerns about deploying neural network classifiers trained on sensitive data. However, differentially private (DP) training (e.g. using DP-SGD) can defend against such attacks with large training datasets causing only minimal loss of network utility. Folklore, heuristics, and (albeit pessimistic) DP bounds suggest this fails for networks trained with small per-class datasets, yet to the best of our knowledge the literature offers no compelling evidence. We directly demonstrate this vulnerability by significantly extending reconstruction attack capabilities under a realistic adversary threat model for few-shot transfer learned image classifiers. We design new white-box and black-box attacks and find that DP-SGD is unable to defend against these without significant classifier utility loss. To address this, we propose a novel homomorphic encryption (HE) method that protects training data without degrading model's accuracy. Conventional HE secures model's input data and requires costly homomorphic implementation of the entire classifier. In contrast, our new scheme is computationally efficient and protects training data rather than input data. This is achieved by means of a simple role-reversal where classifier input data is unencrypted but transfer-learned weights are encrypted. Classifier outputs remain encrypted, thus preventing both white-box and black-box (and any other) training-data reconstruction attacks. Under this new scheme only a trusted party with a private decryption key can obtain the classifier class decisions.
