Table of Contents
Fetching ...

Convex Approximation of Two-Layer ReLU Networks for Hidden State Differential Privacy

Rob Romijnders, Antti Koskela

TL;DR

This work addresses privacy for neural networks under the hidden-state threat model, where only the final trained model is released. It combines convex duality for two-layer ReLU networks with privacy amplification by iteration to derive DP guarantees for NoisyCGD using fixed disjoint mini-batches, achieving privacy-utility trade-offs comparable to DP-SGD on a 2-layer ReLU network. Under a random-data regime, the authors prove utility bounds of order $\tilde{O}\left( 1/(\sqrt{n} \varepsilon) \right)$ for the convex approximation, and demonstrate empirically that NoisyCGD matches DP-SGD performance on benchmark tasks like MNIST and CIFAR-10 while offering practical computational advantages. The results suggest that convex formulations can enable effective hidden-state DP in neural models and point to directions for extending these ideas to deeper or convolutional architectures.

Abstract

The hidden state threat model of differential privacy (DP) assumes that the adversary has access only to the final trained machine learning (ML) model, without seeing intermediate states during training. However, the current privacy analyses under this model are restricted to convex optimization problems, reducing their applicability to multi-layer neural networks, which are essential in modern deep learning applications. Notably, the most successful applications of the hidden state privacy analyses in classification tasks have only been for logistic regression models. We demonstrate that it is possible to privately train convex problems with privacy-utility trade-offs comparable to those of 2-layer ReLU networks trained with DP stochastic gradient descent (DP-SGD). This is achieved through a stochastic approximation of a dual formulation of the ReLU minimization problem, resulting in a strongly convex problem. This enables the use of existing hidden state privacy analyses and provides accurate privacy bounds also for the noisy cyclic mini-batch gradient descent (NoisyCGD) method with fixed disjoint mini-batches. Empirical results on benchmark classification tasks demonstrate that NoisyCGD can achieve privacy-utility trade-offs on par with DP-SGD applied to 2-layer ReLU networks.

Convex Approximation of Two-Layer ReLU Networks for Hidden State Differential Privacy

TL;DR

This work addresses privacy for neural networks under the hidden-state threat model, where only the final trained model is released. It combines convex duality for two-layer ReLU networks with privacy amplification by iteration to derive DP guarantees for NoisyCGD using fixed disjoint mini-batches, achieving privacy-utility trade-offs comparable to DP-SGD on a 2-layer ReLU network. Under a random-data regime, the authors prove utility bounds of order for the convex approximation, and demonstrate empirically that NoisyCGD matches DP-SGD performance on benchmark tasks like MNIST and CIFAR-10 while offering practical computational advantages. The results suggest that convex formulations can enable effective hidden-state DP in neural models and point to directions for extending these ideas to deeper or convolutional architectures.

Abstract

The hidden state threat model of differential privacy (DP) assumes that the adversary has access only to the final trained machine learning (ML) model, without seeing intermediate states during training. However, the current privacy analyses under this model are restricted to convex optimization problems, reducing their applicability to multi-layer neural networks, which are essential in modern deep learning applications. Notably, the most successful applications of the hidden state privacy analyses in classification tasks have only been for logistic regression models. We demonstrate that it is possible to privately train convex problems with privacy-utility trade-offs comparable to those of 2-layer ReLU networks trained with DP stochastic gradient descent (DP-SGD). This is achieved through a stochastic approximation of a dual formulation of the ReLU minimization problem, resulting in a strongly convex problem. This enables the use of existing hidden state privacy analyses and provides accurate privacy bounds also for the noisy cyclic mini-batch gradient descent (NoisyCGD) method with fixed disjoint mini-batches. Empirical results on benchmark classification tasks demonstrate that NoisyCGD can achieve privacy-utility trade-offs on par with DP-SGD applied to 2-layer ReLU networks.
Paper Structure (36 sections, 12 theorems, 33 equations, 9 figures, 2 tables, 1 algorithm)

This paper contains 36 sections, 12 theorems, 33 equations, 9 figures, 2 tables, 1 algorithm.

Key Result

Theorem 2.3

If $(P,Q)$ dominates $\mathcal{M}$ and $(P',Q')$ dominates $\mathcal{M}'$, then $(P \times P',Q \times Q')$ dominates the adaptive composition $\mathcal{M} \circ \mathcal{M}'$.

Figures (9)

  • Figure 1: Values of the product of the learning rate $\eta$ and the $L_2$-regularization constant $\lambda$ that lead to tighter privacy bounds for the final model using Thm. \ref{['thm:thmbok']}, compared to the whole sequence of updates using the DP-SGD analysis. Here $n=6 \cdot 10^4$, $b=1000$, $\sigma=15.0$ and $\delta=10^{-5}$.
  • Figure 2: MNIST: Test accuracy versus the spent privacy budget $\varepsilon$, when each model is trained for 400 epochs. NoisyCGD and DP-SGD generally have comparable performance for the 2-layer ReLU network and much higher accuracy than logistic regression.
  • Figure 3: CIFAR10 Test accuracy versus the spent privacy budget $\varepsilon$, when each model is trained for 400 epochs. NoisyCGD and DP-SGD generally have comparable performance for the 2-layer ReLU network and much higher accuracy than logistic regression.
  • Figure 4: Test accuracies vs. number of epochs, when all models are trained using SGD with batch size 1000. The number of random hyperplanes $P$ varies for the stochastic dual problem. The ReLU network is a 2-layer fully connected ReLU network with a hidden-layer width of 200. Cross-entropy loss is used for all models.
  • Figure 5: Test accuracies vs. number of epochs, when all models are trained using DP-SGD with batch size 1000, for two different noise levels $\sigma$. The number of random hyperplanes $P$ varies for the stochastic dual problem. The ReLU network is a 2-layer fully connected ReLU network with hidden-layer width 200.
  • ...and 4 more figures

Theorems & Definitions (17)

  • Definition 2.1
  • Definition 2.2: zhu2021optimal
  • Theorem 2.3: zhu2021optimal
  • Lemma 2.4: dong2022gaussian
  • Lemma 2.5: lebeda2024avoiding
  • Definition 2.6
  • Theorem 2.7: bok2024shifted
  • Lemma 3.1
  • Theorem 4.1
  • Lemma C.1
  • ...and 7 more