Provable Privacy Attacks on Trained Shallow Neural Networks
Guy Smorodinsky, Gal Vardi, Itay Safran
TL;DR
The paper presents rigorous, provable privacy attacks on trained, shallow neural networks by leveraging the implicit bias of homogeneous two-layer ReLU nets that converge to a margin-maximizing KKT point. In one dimension, it proves that an attacker can reconstruct a substantial portion of the training data with a constant probability, and in high dimensions it proves a high-probability membership inference attack under near-orthogonality of training vectors. The results connect the implicit bias to concrete privacy vulnerabilities, discuss relations to differential privacy and benign overfitting, and are complemented by experiments showing vulnerability extending beyond the strict theoretical regimes. Overall, this work establishes the first rigorous guarantees of privacy leakage driven by implicit bias in trained shallow networks and motivates further defenses and study of privacy in neural networks.
Abstract
We study what provable privacy attacks can be shown on trained, 2-layer ReLU neural networks. We explore two types of attacks; data reconstruction attacks, and membership inference attacks. We prove that theoretical results on the implicit bias of 2-layer neural networks can be used to provably reconstruct a set of which at least a constant fraction are training points in a univariate setting, and can also be used to identify with high probability whether a given point was used in the training set in a high dimensional setting. To the best of our knowledge, our work is the first to show provable vulnerabilities in this implicit-bias-driven setting.
