Table of Contents
Fetching ...

Provable Privacy Attacks on Trained Shallow Neural Networks

Guy Smorodinsky, Gal Vardi, Itay Safran

TL;DR

The paper presents rigorous, provable privacy attacks on trained, shallow neural networks by leveraging the implicit bias of homogeneous two-layer ReLU nets that converge to a margin-maximizing KKT point. In one dimension, it proves that an attacker can reconstruct a substantial portion of the training data with a constant probability, and in high dimensions it proves a high-probability membership inference attack under near-orthogonality of training vectors. The results connect the implicit bias to concrete privacy vulnerabilities, discuss relations to differential privacy and benign overfitting, and are complemented by experiments showing vulnerability extending beyond the strict theoretical regimes. Overall, this work establishes the first rigorous guarantees of privacy leakage driven by implicit bias in trained shallow networks and motivates further defenses and study of privacy in neural networks.

Abstract

We study what provable privacy attacks can be shown on trained, 2-layer ReLU neural networks. We explore two types of attacks; data reconstruction attacks, and membership inference attacks. We prove that theoretical results on the implicit bias of 2-layer neural networks can be used to provably reconstruct a set of which at least a constant fraction are training points in a univariate setting, and can also be used to identify with high probability whether a given point was used in the training set in a high dimensional setting. To the best of our knowledge, our work is the first to show provable vulnerabilities in this implicit-bias-driven setting.

Provable Privacy Attacks on Trained Shallow Neural Networks

TL;DR

The paper presents rigorous, provable privacy attacks on trained, shallow neural networks by leveraging the implicit bias of homogeneous two-layer ReLU nets that converge to a margin-maximizing KKT point. In one dimension, it proves that an attacker can reconstruct a substantial portion of the training data with a constant probability, and in high dimensions it proves a high-probability membership inference attack under near-orthogonality of training vectors. The results connect the implicit bias to concrete privacy vulnerabilities, discuss relations to differential privacy and benign overfitting, and are complemented by experiments showing vulnerability extending beyond the strict theoretical regimes. Overall, this work establishes the first rigorous guarantees of privacy leakage driven by implicit bias in trained shallow networks and motivates further defenses and study of privacy in neural networks.

Abstract

We study what provable privacy attacks can be shown on trained, 2-layer ReLU neural networks. We explore two types of attacks; data reconstruction attacks, and membership inference attacks. We prove that theoretical results on the implicit bias of 2-layer neural networks can be used to provably reconstruct a set of which at least a constant fraction are training points in a univariate setting, and can also be used to identify with high probability whether a given point was used in the training set in a high dimensional setting. To the best of our knowledge, our work is the first to show provable vulnerabilities in this implicit-bias-driven setting.

Paper Structure

This paper contains 23 sections, 20 theorems, 86 equations, 4 figures, 1 algorithm.

Key Result

Theorem 2.1

Let $\Phi(\boldsymbol{\theta}; x)$ be a homogeneous ReLU neural network. Consider minimizing the logistic ($z\mapsto \log(1+e^{-z})$) or the exponential ($z\mapsto e^{-z}$) loss using gradient flow (which is a continuous time analog of gradient descent) over a binary classification set $\{(x_i, y_i)

Figures (4)

  • Figure 1: The percentage of training points that lie on the margin (up to a slack of 10%) increases as the dimension increases.
  • Figure 2: The percentage of test points that lie on or above the margin drops to zero for sufficiently large input dimensions, much earlier than what our theory predicts.
  • Figure 3: A closer look at the smaller values. The percentage of test points that lie on or above the margin decreases rapidly as the dimension increases.
  • Figure 4: The blue network is a network which the breaking point is not a training point. The dotted-red network has smaller norm.

Theorems & Definitions (43)

  • Theorem 2.1: paraphrased version of KKT2019, KKT2020
  • Theorem 3.1
  • proof
  • Theorem 3.2
  • Theorem 3.3
  • Theorem 3.4
  • Remark 4.1: Black box attacks
  • Theorem 4.2
  • Corollary 4.3: Known margin value
  • proof
  • ...and 33 more