Differentially Private Bilevel Optimization

Guy Kornowski

Differentially Private Bilevel Optimization

Guy Kornowski

TL;DR

This work tackles the challenge of differential privacy in bilevel optimization, where the outer objective depends on the inner optimum. It introduces gradient-based $(\epsilon,\delta)$-DP algorithms that avoid costly Hessian inversions, and proves that the outer-gradient can be controlled with a bound of $\widetilde{\mathcal{O}}\left((\sqrt{d_{\mathrm{up}}}/(\epsilon n))^{1/2}+(\sqrt{d_{\mathrm{low}}}/(\epsilon n))^{1/3}\right)$ under standard smoothness and strong-convexity assumptions. The paper develops a DP bilevel ERM algorithm, a practical mini-batch variant, and extends guarantees to population losses, while addressing privacy leakage through private inner solvers. As an application, it derives a private, on-the-fly update rule for tuning a regularization hyperparameter. Overall, the results establish the first central-DP, gradient-only methods for bilevel optimization with rigorous high-probability utility guarantees, enabling private hyperparameter tuning and scalable private bilevel learning.

Abstract

We present differentially private (DP) algorithms for bilevel optimization, a problem class that received significant attention lately in various machine learning applications. These are the first algorithms for such problems under standard DP constraints, and are also the first to avoid Hessian computations which are prohibitive in large-scale settings. Under the well-studied setting in which the upper-level is not necessarily convex and the lower-level problem is strongly-convex, our proposed gradient-based $(ε,δ)$-DP algorithm returns a point with hypergradient norm at most $\widetilde{\mathcal{O}}\left((\sqrt{d_\mathrm{up}}/εn)^{1/2}+(\sqrt{d_\mathrm{low}}/εn)^{1/3}\right)$ where $n$ is the dataset size, and $d_\mathrm{up}/d_\mathrm{low}$ are the upper/lower level dimensions. Our analysis covers constrained and unconstrained problems alike, accounts for mini-batch gradients, and applies to both empirical and population losses. As an application, we specialize our analysis to derive a simple private rule for tuning a regularization hyperparameter.

Differentially Private Bilevel Optimization

TL;DR

This work tackles the challenge of differential privacy in bilevel optimization, where the outer objective depends on the inner optimum. It introduces gradient-based

-DP algorithms that avoid costly Hessian inversions, and proves that the outer-gradient can be controlled with a bound of

under standard smoothness and strong-convexity assumptions. The paper develops a DP bilevel ERM algorithm, a practical mini-batch variant, and extends guarantees to population losses, while addressing privacy leakage through private inner solvers. As an application, it derives a private, on-the-fly update rule for tuning a regularization hyperparameter. Overall, the results establish the first central-DP, gradient-only methods for bilevel optimization with rigorous high-probability utility guarantees, enabling private hyperparameter tuning and scalable private bilevel learning.

Abstract

-DP algorithm returns a point with hypergradient norm at most

where

is the dataset size, and

are the upper/lower level dimensions. Our analysis covers constrained and unconstrained problems alike, accounts for mini-batch gradients, and applies to both empirical and population losses. As an application, we specialize our analysis to derive a simple private rule for tuning a regularization hyperparameter.

Paper Structure (28 sections, 23 theorems, 85 equations, 4 algorithms)

This paper contains 28 sections, 23 theorems, 85 equations, 4 algorithms.

Introduction
Our contributions
Related work
Preliminaries
Notation and terminology.
Differential privacy.
Gradient mapping.
Setting
Warm up: Privacy can leak from lower to upper level
Algorithm for DP bilevel ERM
Analysis overview
Mini-batch algorithm for DP bilevel ERM
Generalizing from ERM to population loss
Application: private regularization hyperparameter tuning
Proofs
...and 13 more sections

Key Result

Theorem 4.1

Assume ass: main 1 and ass: main 2 hold, and that $\alpha\leq \ell\kappa^{3}\min\{\frac{1}{2\kappa},\frac{L_0^g}{L_0^f},\frac{L_1^g}{L_1^f},\frac{\Delta_F}{\ell\kappa}\}$. Then there is a parameter assignment $\lambda\asymp\ell\kappa^3\alpha^{-1},~\sigma^2\asymp\ell^2\kappa^2T\log(T/\delta)\epsilon^ where $K_1=\mathcal{O}(\Delta_F^{1/4}\ell^{3/4}\kappa^{5/4}),~K_2=\mathcal{O}(\Delta_F^{1/6}\ell^{5

Theorems & Definitions (40)

Remark 2.3
Example 3.1
Theorem 4.1
Remark 4.2
Remark 4.3
Lemma 4.4: kwon2023fullychen2024findingchen2025near
Lemma 4.4
Lemma 4.4
Proposition 4.4
Theorem 5.1
...and 30 more

Differentially Private Bilevel Optimization

TL;DR

Abstract

Differentially Private Bilevel Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (40)