Efficient Public Verification of Private ML via Regularization
Zoë Ruha Bell, Anvith Thudi, Olive Franzese-McLaughlin, Nicolas Papernot, Shafi Goldwasser
TL;DR
This work addresses the problem of publicly verifying differential privacy guarantees for models trained on private data, showing that black-box auditing can be unreliable due to backdoors. It introduces a near-optimal DP-SCO algorithm whose privacy guarantees can be certified with substantially less computation than training, by modifying phased empirical risk minimization and relying on standard DP composition. The authors prove a DP verification protocol requiring only $n$ gradients and $d ig ceil ext{log}_2(n) ig floor$ Gaussian random variable commitments, and they demonstrate practical verification costs and an extension to convex unlearning. The results offer a feasible path to public DP verification on large datasets and may influence verification approaches for DP-like guarantees beyond the studied setting.
Abstract
Training with differential privacy (DP) provides a guarantee to members in a dataset that they cannot be identified by users of the released model. However, those data providers, and, in general, the public, lack methods to efficiently verify that models trained on their data satisfy DP guarantees. The amount of compute needed to verify DP guarantees for current algorithms scales with the amount of compute required to train the model. In this paper we design the first DP algorithm with near optimal privacy-utility trade-offs but whose DP guarantees can be verified cheaper than training. We focus on DP stochastic convex optimization (DP-SCO), where optimal privacy-utility trade-offs are known. Here we show we can obtain tight privacy-utility trade-offs by privately minimizing a series of regularized objectives and only using the standard DP composition bound. Crucially, this method can be verified with much less compute than training. This leads to the first known DP-SCO algorithm with near optimal privacy-utility whose DP verification scales better than training cost, significantly reducing verification costs on large datasets.
