Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model
Tudor Cebere, Aurélien Bellet, Nicolas Papernot
TL;DR
The paper investigates DP-SGD under the hidden state threat model, where only the final model is released. It introduces gradient-crafting adversaries that predefine gradient sequences to maximize final-model privacy loss, without relying on intermediate checkpoints. The main finding is that with per-iteration insertion ($k=1$), empirical auditing matches the theoretical upper bounds, indicating no privacy amplification from concealing updates in this regime, while for larger periods and non-convex settings, gaps persist but can be mitigated by adversaries that also shape the loss landscape; overall, the work advances understanding of privacy guarantees in hidden-state DP-SGD and motivates tighter accounting in non-convex regimes. The results highlight regime-dependent privacy amplification and offer pathways to tighter upper bounds and more accurate auditing.
Abstract
Machine learning models can be trained with formal privacy guarantees via differentially private optimizers such as DP-SGD. In this work, we focus on a threat model where the adversary has access only to the final model, with no visibility into intermediate updates. In the literature, this hidden state threat model exhibits a significant gap between the lower bound from empirical privacy auditing and the theoretical upper bound provided by privacy accounting. To challenge this gap, we propose to audit this threat model with adversaries that craft a gradient sequence designed to maximize the privacy loss of the final model without relying on intermediate updates. Our experiments show that this approach consistently outperforms previous attempts at auditing the hidden state model. Furthermore, our results advance the understanding of achievable privacy guarantees within this threat model. Specifically, when the crafted gradient is inserted at every optimization step, we show that concealing the intermediate model updates in DP-SGD does not enhance the privacy guarantees. The situation is more complex when the crafted gradient is not inserted at every step: our auditing lower bound matches the privacy upper bound only for an adversarially-chosen loss landscape and a sufficiently large batch size. This suggests that existing privacy upper bounds can be improved in certain regimes.
