Differentially Private Non-convex Distributionally Robust Optimization

Difei Xu; Meng Ding; Zebin Ma; Huanyi Xie; Youming Tao; Aicha Slaitane; Di Wang

Differentially Private Non-convex Distributionally Robust Optimization

Difei Xu, Meng Ding, Zebin Ma, Huanyi Xie, Youming Tao, Aicha Slaitane, Di Wang

TL;DR

A comprehensive study of DP-(finite-sum)-DRO with $\psi$-divergence and non-convex loss, and a novel DP Double-Spider optimization method, called DP Double-Spider, tailored to this structure, which achieves a utility bound matching the best-known result for non-convex DP-ERM.

Abstract

Real-world deployments routinely face distribution shifts, group imbalances, and adversarial perturbations, under which the traditional Empirical Risk Minimization (ERM) framework can degrade severely. Distributionally Robust Optimization (DRO) addresses this issue by optimizing the worst-case expected loss over an uncertainty set of distributions, offering a principled approach to robustness. Meanwhile, as training data in DRO always involves sensitive information, safeguarding it against leakage under Differential Privacy (DP) is essential. In contrast to classical DP-ERM, DP-DRO has received much less attention due to its minimax optimization structure with uncertainty constraint. To bridge the gap, we provide a comprehensive study of DP-(finite-sum)-DRO with $ψ$-divergence and non-convex loss. First, we study DRO with general $ψ$-divergence by reformulating it as a minimization problem, and develop a novel $(\varepsilon, δ)$-DP optimization method, called DP Double-Spider, tailored to this structure. Under mild assumptions, we show that it achieves a utility bound of $\mathcal{O}(\frac{1}{\sqrt{n}}+ (\frac{\sqrt{d \log (1/δ)}}{n \varepsilon})^{2/3})$ in terms of the gradient norm, where $n$ denotes the data size and $d$ denotes the model dimension. We further improve the utility rate for specific divergences. In particular, for DP-DRO with KL-divergence, by transforming the problem into a compositional finite-sum optimization problem, we develop a DP Recursive-Spider method and show that it achieves a utility bound of $\mathcal{O}((\frac{\sqrt{d \log(1/δ)}}{n\varepsilon})^{2/3} )$, matching the best-known result for non-convex DP-ERM. Experimentally, we demonstrate that our proposed methods outperform existing approaches for DP minimax optimization.

Differentially Private Non-convex Distributionally Robust Optimization

TL;DR

A comprehensive study of DP-(finite-sum)-DRO with

-divergence and non-convex loss, and a novel DP Double-Spider optimization method, called DP Double-Spider, tailored to this structure, which achieves a utility bound matching the best-known result for non-convex DP-ERM.

Abstract

-divergence and non-convex loss. First, we study DRO with general

-divergence by reformulating it as a minimization problem, and develop a novel

-DP optimization method, called DP Double-Spider, tailored to this structure. Under mild assumptions, we show that it achieves a utility bound of

in terms of the gradient norm, where

denotes the data size and

denotes the model dimension. We further improve the utility rate for specific divergences. In particular, for DP-DRO with KL-divergence, by transforming the problem into a compositional finite-sum optimization problem, we develop a DP Recursive-Spider method and show that it achieves a utility bound of

, matching the best-known result for non-convex DP-ERM. Experimentally, we demonstrate that our proposed methods outperform existing approaches for DP minimax optimization.

Paper Structure (20 sections, 18 theorems, 78 equations, 3 figures, 3 tables, 3 algorithms)

This paper contains 20 sections, 18 theorems, 78 equations, 3 figures, 3 tables, 3 algorithms.

Introduction
Preliminaries
Differential Privacy
Distributionally Robust Optimization
DP Double-SPIDER
Improved Rates via DP Recursive-SPIDER
Experiments
Experimental Setup
MIA Experiments
Experimental Results
Test Accuracy Analysis
Gradient Norm Analysis.
MIA Experiments
Conclusion
Related Work
...and 5 more sections

Key Result

Theorem 1

For any $\varepsilon>0$ and $\delta\in (0, 1)$, let $\sigma_1=\mathcal{O}(\frac{ C_1 \sqrt{T \log (1/\delta)}}{n\sqrt{q}\varepsilon})$, $\sigma_2=\mathcal{O}(\frac{C_2\sqrt{\log (1/\delta)}}{N_2 \varepsilon})$. Similarly, set $\sigma_3=\mathcal{O}(\frac{C_3 \sqrt{T\log (1/\delta)}}{n \sqrt{q}\vareps

Figures (3)

Figure 1: Experimental Results: The performances of four algorithms on CIFAR10-ST, CelebA, Fashion-MNIST, MNIST-ST respectively
Figure 2: Test AUC Results: The performances of four algorithms on CIFAR10-ST, CelebA, Fashion-MNIST, MNIST-ST respectively
Figure 3: Test F1 Score Results: The performances of four algorithms on CIFAR10-ST, CelebA, Fashion-MNIST, MNIST-ST respectively

Theorems & Definitions (38)

Definition 1: Differential Privacy dwork2006calibrating
Definition 2
Definition 3
Definition 4: DP-DRO
Definition 5
Definition 6
Definition 7: Generalized $(L_0,L_1)$-smooth
Definition 8
Theorem 1
Theorem 2
...and 28 more

Differentially Private Non-convex Distributionally Robust Optimization

TL;DR

Abstract

Differentially Private Non-convex Distributionally Robust Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (38)