Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds
Raef Bassily, Adam Smith, Abhradeep Thakurta
TL;DR
This work provides a comprehensive treatment of differentially private empirical risk minimization for convex losses with Lipschitz per-point contributions on bounded domains. It introduces three main algorithmic approaches—gradient-descent with private noise, exponential-mechanism-based sampling, and localization for strongly convex losses—each achieving near-tight excess risk bounds under either (ε,0) or (ε,δ) privacy. The authors establish matching lower bounds that hold even for simple linear and quadratic losses, and they develop efficient log-concave sampling techniques to enable practical, private optimization over general convex sets. The results yield both improved theoretical understanding of privacy-utility trade-offs and practical implications for private training of models like SVMs and medians in high dimensions. Together, these contributions advance the design of privacy-preserving ERM algorithms with provable, near-optimal performance across non-smooth and strongly convex regimes.
Abstract
In this paper, we initiate a systematic investigation of differentially private algorithms for convex empirical risk minimization. Various instantiations of this problem have been studied before. We provide new algorithms and matching lower bounds for private ERM assuming only that each data point's contribution to the loss function is Lipschitz bounded and that the domain of optimization is bounded. We provide a separate set of algorithms and matching lower bounds for the setting in which the loss functions are known to also be strongly convex. Our algorithms run in polynomial time, and in some cases even match the optimal non-private running time (as measured by oracle complexity). We give separate algorithms (and lower bounds) for $(ε,0)$- and $(ε,δ)$-differential privacy; perhaps surprisingly, the techniques used for designing optimal algorithms in the two cases are completely different. Our lower bounds apply even to very simple, smooth function families, such as linear and quadratic functions. This implies that algorithms from previous work can be used to obtain optimal error rates, under the additional assumption that the contributions of each data point to the loss function is smooth. We show that simple approaches to smoothing arbitrary loss functions (in order to apply previous techniques) do not yield optimal error rates. In particular, optimal algorithms were not previously known for problems such as training support vector machines and the high-dimensional median.
