Convex Approximation of Two-Layer ReLU Networks for Hidden State Differential Privacy
Rob Romijnders, Antti Koskela
TL;DR
This work addresses privacy for neural networks under the hidden-state threat model, where only the final trained model is released. It combines convex duality for two-layer ReLU networks with privacy amplification by iteration to derive DP guarantees for NoisyCGD using fixed disjoint mini-batches, achieving privacy-utility trade-offs comparable to DP-SGD on a 2-layer ReLU network. Under a random-data regime, the authors prove utility bounds of order $\tilde{O}\left( 1/(\sqrt{n} \varepsilon) \right)$ for the convex approximation, and demonstrate empirically that NoisyCGD matches DP-SGD performance on benchmark tasks like MNIST and CIFAR-10 while offering practical computational advantages. The results suggest that convex formulations can enable effective hidden-state DP in neural models and point to directions for extending these ideas to deeper or convolutional architectures.
Abstract
The hidden state threat model of differential privacy (DP) assumes that the adversary has access only to the final trained machine learning (ML) model, without seeing intermediate states during training. However, the current privacy analyses under this model are restricted to convex optimization problems, reducing their applicability to multi-layer neural networks, which are essential in modern deep learning applications. Notably, the most successful applications of the hidden state privacy analyses in classification tasks have only been for logistic regression models. We demonstrate that it is possible to privately train convex problems with privacy-utility trade-offs comparable to those of 2-layer ReLU networks trained with DP stochastic gradient descent (DP-SGD). This is achieved through a stochastic approximation of a dual formulation of the ReLU minimization problem, resulting in a strongly convex problem. This enables the use of existing hidden state privacy analyses and provides accurate privacy bounds also for the noisy cyclic mini-batch gradient descent (NoisyCGD) method with fixed disjoint mini-batches. Empirical results on benchmark classification tasks demonstrate that NoisyCGD can achieve privacy-utility trade-offs on par with DP-SGD applied to 2-layer ReLU networks.
