When Gradient Clipping Becomes a Control Mechanism for Differential Privacy in Deep Learning
Mohammad Partohaghighi, Roummel Marcia, Bruce J. West, YangQuan Chen
TL;DR
This work tackles the clipping bottleneck in differentially private deep learning by recasting clipping as a closed-loop control problem. It introduces WW-DP-SGD, which leverages a WeightWatcher-style spectral tail exponent $oldsymbol{}$ computed from a private weight matrix to gauge training health and adapt the clipping threshold $C_t$ via a log-domain, bounded controller; the method operates as post-processing of DP outputs and does not increase privacy loss under standard accounting. Empirical results across vision and tabular tasks show WW-DP-SGD consistently improves utility and stability over fixed clipping and top adaptive baselines, with modest runtime overhead and robustness to distribution shifts. The approach provides practical guidance on probe-layer selection, smoothing, and controller parameters, and it opens avenues for further theoretical linking between spectral properties and DP optimization dynamics.
Abstract
Privacy-preserving training on sensitive data commonly relies on differentially private stochastic optimization with gradient clipping and Gaussian noise. The clipping threshold is a critical control knob: if set too small, systematic over-clipping induces optimization bias; if too large, injected noise dominates updates and degrades accuracy. Existing adaptive clipping methods often depend on per-example gradient norm statistics, adding computational overhead and introducing sensitivity to datasets and architectures. We propose a control-driven clipping strategy that adapts the threshold using a lightweight, weight-only spectral diagnostic computed from model parameters. At periodic probe steps, the method analyzes a designated weight matrix via spectral decomposition and estimates a heavy-tailed spectral indicator associated with training stability. This indicator is smoothed over time and fed into a bounded feedback controller that updates the clipping threshold multiplicatively in the log domain. Because the controller uses only parameters produced during privacy-preserving training, the resulting threshold updates are post-processing and do not increase privacy loss beyond that of the underlying DP optimizer under standard composition accounting.
