Table of Contents
Fetching ...

Differentially Private Bilevel Optimization

Guy Kornowski

TL;DR

This work tackles the challenge of differential privacy in bilevel optimization, where the outer objective depends on the inner optimum. It introduces gradient-based $(\epsilon,\delta)$-DP algorithms that avoid costly Hessian inversions, and proves that the outer-gradient can be controlled with a bound of $\widetilde{\mathcal{O}}\left((\sqrt{d_{\mathrm{up}}}/(\epsilon n))^{1/2}+(\sqrt{d_{\mathrm{low}}}/(\epsilon n))^{1/3}\right)$ under standard smoothness and strong-convexity assumptions. The paper develops a DP bilevel ERM algorithm, a practical mini-batch variant, and extends guarantees to population losses, while addressing privacy leakage through private inner solvers. As an application, it derives a private, on-the-fly update rule for tuning a regularization hyperparameter. Overall, the results establish the first central-DP, gradient-only methods for bilevel optimization with rigorous high-probability utility guarantees, enabling private hyperparameter tuning and scalable private bilevel learning.

Abstract

We present differentially private (DP) algorithms for bilevel optimization, a problem class that received significant attention lately in various machine learning applications. These are the first algorithms for such problems under standard DP constraints, and are also the first to avoid Hessian computations which are prohibitive in large-scale settings. Under the well-studied setting in which the upper-level is not necessarily convex and the lower-level problem is strongly-convex, our proposed gradient-based $(ε,δ)$-DP algorithm returns a point with hypergradient norm at most $\widetilde{\mathcal{O}}\left((\sqrt{d_\mathrm{up}}/εn)^{1/2}+(\sqrt{d_\mathrm{low}}/εn)^{1/3}\right)$ where $n$ is the dataset size, and $d_\mathrm{up}/d_\mathrm{low}$ are the upper/lower level dimensions. Our analysis covers constrained and unconstrained problems alike, accounts for mini-batch gradients, and applies to both empirical and population losses. As an application, we specialize our analysis to derive a simple private rule for tuning a regularization hyperparameter.

Differentially Private Bilevel Optimization

TL;DR

This work tackles the challenge of differential privacy in bilevel optimization, where the outer objective depends on the inner optimum. It introduces gradient-based -DP algorithms that avoid costly Hessian inversions, and proves that the outer-gradient can be controlled with a bound of under standard smoothness and strong-convexity assumptions. The paper develops a DP bilevel ERM algorithm, a practical mini-batch variant, and extends guarantees to population losses, while addressing privacy leakage through private inner solvers. As an application, it derives a private, on-the-fly update rule for tuning a regularization hyperparameter. Overall, the results establish the first central-DP, gradient-only methods for bilevel optimization with rigorous high-probability utility guarantees, enabling private hyperparameter tuning and scalable private bilevel learning.

Abstract

We present differentially private (DP) algorithms for bilevel optimization, a problem class that received significant attention lately in various machine learning applications. These are the first algorithms for such problems under standard DP constraints, and are also the first to avoid Hessian computations which are prohibitive in large-scale settings. Under the well-studied setting in which the upper-level is not necessarily convex and the lower-level problem is strongly-convex, our proposed gradient-based -DP algorithm returns a point with hypergradient norm at most where is the dataset size, and are the upper/lower level dimensions. Our analysis covers constrained and unconstrained problems alike, accounts for mini-batch gradients, and applies to both empirical and population losses. As an application, we specialize our analysis to derive a simple private rule for tuning a regularization hyperparameter.
Paper Structure (28 sections, 23 theorems, 85 equations, 4 algorithms)

This paper contains 28 sections, 23 theorems, 85 equations, 4 algorithms.

Key Result

Theorem 4.1

Assume ass: main 1 and ass: main 2 hold, and that $\alpha\leq \ell\kappa^{3}\min\{\frac{1}{2\kappa},\frac{L_0^g}{L_0^f},\frac{L_1^g}{L_1^f},\frac{\Delta_F}{\ell\kappa}\}$. Then there is a parameter assignment $\lambda\asymp\ell\kappa^3\alpha^{-1},~\sigma^2\asymp\ell^2\kappa^2T\log(T/\delta)\epsilon^ where $K_1=\mathcal{O}(\Delta_F^{1/4}\ell^{3/4}\kappa^{5/4}),~K_2=\mathcal{O}(\Delta_F^{1/6}\ell^{5

Theorems & Definitions (40)

  • Remark 2.3
  • Example 3.1
  • Theorem 4.1
  • Remark 4.2
  • Remark 4.3
  • Lemma 4.4: kwon2023fullychen2024findingchen2025near
  • Lemma 4.4
  • Lemma 4.4
  • Proposition 4.4
  • Theorem 5.1
  • ...and 30 more