Table of Contents
Fetching ...

Private Zeroth-Order Optimization with Public Data

Xuchen Gong, Tian Li

TL;DR

PAZO introduces public-data-assisted private zeroth-order optimization to reduce the privacy-utility gap in DP training. By integrating three PAZO variants—PAZO-M (mixing private zeroth-order estimates with public gradients), PAZO-P (restricting updates to the public gradient subspace), and PAZO-S (selecting the best public gradient)—the framework achieves improved convergence and privacy guarantees while maintaining the efficiency of zeroth-order methods. Theoretical results establish γ-similarity-based convergence with reduced dimension dependence, while empirical results across vision and language tasks show superior privacy/utility tradeoffs and up to 16× speedups over first-order baselines in highly private settings. The approach demonstrates robust performance across pre-training and fine-tuning, highlighting public data as a practical catalyst for DP training in diverse domains.

Abstract

One of the major bottlenecks for deploying popular first-order differentially private (DP) machine learning algorithms (e.g., DP-SGD) lies in their high computation and memory cost, despite the existence of optimized implementations. Zeroth-order methods have promise in mitigating the overhead, as they leverage function evaluations to approximate the gradients, hence significantly easier to privatize. While recent works have explored zeroth-order approaches in both private and non-private settings, they still suffer from relatively low utilities compared with DP-SGD, and have only been evaluated in limited application domains. In this work, we propose to leverage public information to guide and improve gradient approximation of private zeroth-order algorithms. We explore a suite of public-data-assisted zeroth-order optimizers (PAZO) with minimal overhead. We provide theoretical analyses of the PAZO framework under an assumption of the similarity between public and private data. Empirically, we demonstrate that PAZO achieves superior privacy/utility tradeoffs across vision and text tasks in both pre-training and fine-tuning settings, outperforming the best first-order baselines (with public data) especially in highly private regimes, while offering up to $16\times$ runtime speedup.

Private Zeroth-Order Optimization with Public Data

TL;DR

PAZO introduces public-data-assisted private zeroth-order optimization to reduce the privacy-utility gap in DP training. By integrating three PAZO variants—PAZO-M (mixing private zeroth-order estimates with public gradients), PAZO-P (restricting updates to the public gradient subspace), and PAZO-S (selecting the best public gradient)—the framework achieves improved convergence and privacy guarantees while maintaining the efficiency of zeroth-order methods. Theoretical results establish γ-similarity-based convergence with reduced dimension dependence, while empirical results across vision and language tasks show superior privacy/utility tradeoffs and up to 16× speedups over first-order baselines in highly private settings. The approach demonstrates robust performance across pre-training and fine-tuning, highlighting public data as a practical catalyst for DP training in diverse domains.

Abstract

One of the major bottlenecks for deploying popular first-order differentially private (DP) machine learning algorithms (e.g., DP-SGD) lies in their high computation and memory cost, despite the existence of optimized implementations. Zeroth-order methods have promise in mitigating the overhead, as they leverage function evaluations to approximate the gradients, hence significantly easier to privatize. While recent works have explored zeroth-order approaches in both private and non-private settings, they still suffer from relatively low utilities compared with DP-SGD, and have only been evaluated in limited application domains. In this work, we propose to leverage public information to guide and improve gradient approximation of private zeroth-order algorithms. We explore a suite of public-data-assisted zeroth-order optimizers (PAZO) with minimal overhead. We provide theoretical analyses of the PAZO framework under an assumption of the similarity between public and private data. Empirically, we demonstrate that PAZO achieves superior privacy/utility tradeoffs across vision and text tasks in both pre-training and fine-tuning settings, outperforming the best first-order baselines (with public data) especially in highly private regimes, while offering up to runtime speedup.

Paper Structure

This paper contains 43 sections, 10 theorems, 128 equations, 8 figures, 13 tables, 3 algorithms.

Key Result

Theorem 4.1

Assume public and private data are $\gamma$-similar. Let Assumptions 1-4 hold. For possibly non-convex $f(\cdot)$, running Algorithm alg:mix under a fixed learning rate for $T$ rounds gives Additionally, let $c_1$ and $c_2$ be the constants that make PAZO-M satisfy $(\varepsilon, \delta)$-differential privacy for any $\varepsilon < c_1b^2T/n^2, \delta > 0$. Then PAZO-M obtains the error rate by

Figures (8)

  • Figure 1: Results of CIFAR-10 with NFResNet18 trained from scratch under privacy budget $\varepsilon=3$. Left: Zeroth-order methods demonstrate consistent accuracies under various privacy budgets compared with the best first-order method with public data. Right: Proposed zeroth-order approaches (PAZO-*) are more accurate than vanilla DPZero, and significantly more efficient than all the public data augmented first-order baselines.
  • Figure 2: Performance of PAZO and the baselines in four settings. It shows that (1) all three PAZO variants outperform DPZero across all datasets, (2) all of the first-order methods (DP-SGD, DPMD, DOPE-SGD, and GEP), with or without public data, are more sensitive to smaller $\varepsilon$'s than zeroth-order ones, and (3) when $\varepsilon$'s are small, PAZO is superior to first-order baselines. "Fail" indicates failure to converge; the detailed accuracy numbers are in Tables \ref{['table:cifar10']}$-$\ref{['table:mnli']}.
  • Figure 3: We compare the best private zeroth-order (ZO) methods with the best private first-order (FO) methods, with public data (+PUB) or without. Note that ZO+PUB is PAZO. It shows that (1) with or without public data, the performance gap between ZO and FO decreases as $\varepsilon$ decreases, (2) using public data expands the range of $\varepsilon$'s where ZO methods outperform FO ones, and (3) ZO+PUB (PAZO) achieves better privacy/utility tradeoff than FO+PUB when $\varepsilon$'s are small.
  • Figure 4: Convergence speed of private zeroth-order methods with (PAZO) or without (DPZero) public data. We observe that PAZO variants have slightly different convergence speed, but they are all consistently faster than the baseline. The reported are smoothed test accuracies under privacy $\varepsilon=1$.
  • Figure 5: The utility/speed tradeoffs of different methods. It shows that PAZO is up to 16$\times$ faster in each training iteration than FO and FO+PUB while being comparably performant. The reported results are under privacy budget $\varepsilon=1$, and the detailed numbers are in Table \ref{['table:speed']}.
  • ...and 3 more figures

Theorems & Definitions (17)

  • Definition 2.1: Differential privacy dwork2006calibrating
  • Definition 4.1: $\gamma$-similarity
  • Theorem 4.1: Convergence of PAZO-M
  • Theorem 4.2: Convergence of PAZO-P
  • Theorem 4.3: Convergence of PAZO-S
  • Lemma B.1
  • Lemma B.2: dpzero, Lemma C.1 and C.2
  • Theorem B.3: Full statement of Theorem \ref{['thm:pazo-m']}
  • proof
  • Theorem B.4: Full statement of Theorem \ref{['thm:pazo-p']}
  • ...and 7 more