Table of Contents
Fetching ...

DP-HYPE: Distributed Differentially Private Hyperparameter Search

Johannes Liebenow, Thorsten Peinemann, Esfandiar Mohammadi

TL;DR

DP-Hype tackles the challenge of privately tuning hyperparameters in distributed learning without a trusted aggregator. It combines per-client local evaluations, a k-vote private voting scheme, and secure aggregation with Rényi-DP-based privacy accounting to select a compromise hyperparameter that generalizes across clients. The method achieves client-level DP independent of the hyperparameter count and provides utility guarantees with explicit bounds, while remaining scalable to large numbers of clients and hyperparameters. Empirical results on MNIST, CIFAR-10, and Adult demonstrate strong performance under iid and non-iid data, even for small privacy budgets, and the approach is implemented as a Flower submodule for practical use.

Abstract

The tuning of hyperparameters in distributed machine learning can substantially impact model performance. When the hyperparameters are tuned on sensitive data, privacy becomes an important challenge and to this end, differential privacy has emerged as the de facto standard for provable privacy. A standard setting when performing distributed learning tasks is that clients agree on a shared setup, i.e., find a compromise from a set of hyperparameters, like the learning rate of the model to be trained. Yet, prior work on differentially private hyperparameter tuning either uses computationally expensive cryptographic protocols, determines hyperparameters separately for each client, or applies differential privacy locally, which can lead to undesirable utility-privacy trade-offs. In this work, we present our algorithm DP-HYPE, which performs a distributed and privacy-preserving hyperparameter search by conducting a distributed voting based on local hyperparameter evaluations of clients. In this way, DP-HYPE selects hyperparameters that lead to a compromise supported by the majority of clients, while maintaining scalability and independence from specific learning tasks. We prove that DP-HYPE preserves the strong notion of differential privacy called client-level differential privacy and, importantly, show that its privacy guarantees do not depend on the number of hyperparameters. We also provide bounds on its utility guarantees, that is, the probability of reaching a compromise, and implement DP-HYPE as a submodule in the popular Flower framework for distributed machine learning. In addition, we evaluate performance on multiple benchmark data sets in iid as well as multiple non-iid settings and demonstrate high utility of DP-HYPE even under small privacy budgets.

DP-HYPE: Distributed Differentially Private Hyperparameter Search

TL;DR

DP-Hype tackles the challenge of privately tuning hyperparameters in distributed learning without a trusted aggregator. It combines per-client local evaluations, a k-vote private voting scheme, and secure aggregation with Rényi-DP-based privacy accounting to select a compromise hyperparameter that generalizes across clients. The method achieves client-level DP independent of the hyperparameter count and provides utility guarantees with explicit bounds, while remaining scalable to large numbers of clients and hyperparameters. Empirical results on MNIST, CIFAR-10, and Adult demonstrate strong performance under iid and non-iid data, even for small privacy budgets, and the approach is implemented as a Flower submodule for practical use.

Abstract

The tuning of hyperparameters in distributed machine learning can substantially impact model performance. When the hyperparameters are tuned on sensitive data, privacy becomes an important challenge and to this end, differential privacy has emerged as the de facto standard for provable privacy. A standard setting when performing distributed learning tasks is that clients agree on a shared setup, i.e., find a compromise from a set of hyperparameters, like the learning rate of the model to be trained. Yet, prior work on differentially private hyperparameter tuning either uses computationally expensive cryptographic protocols, determines hyperparameters separately for each client, or applies differential privacy locally, which can lead to undesirable utility-privacy trade-offs. In this work, we present our algorithm DP-HYPE, which performs a distributed and privacy-preserving hyperparameter search by conducting a distributed voting based on local hyperparameter evaluations of clients. In this way, DP-HYPE selects hyperparameters that lead to a compromise supported by the majority of clients, while maintaining scalability and independence from specific learning tasks. We prove that DP-HYPE preserves the strong notion of differential privacy called client-level differential privacy and, importantly, show that its privacy guarantees do not depend on the number of hyperparameters. We also provide bounds on its utility guarantees, that is, the probability of reaching a compromise, and implement DP-HYPE as a submodule in the popular Flower framework for distributed machine learning. In addition, we evaluate performance on multiple benchmark data sets in iid as well as multiple non-iid settings and demonstrate high utility of DP-HYPE even under small privacy budgets.

Paper Structure

This paper contains 40 sections, 4 theorems, 8 equations, 11 figures, 3 tables, 1 algorithm.

Key Result

Theorem 2.1

If a randomized algorithm $\mathcal{M}$ satisfies $(\alpha, \varepsilon)$-RDP then it also satisfies $(\varepsilon + \log((\alpha-1)/\alpha) - (\log \delta + \log \alpha)/(\alpha-1), \delta)$-DP for any $\delta \in (0,1)$.

Figures (11)

  • Figure 1: A conceptual overview of our privacy-preserving algorithm for distributed hyperparameter search DP-Hype. The set of hyperparameters $H = \{H_1, \dots, H_p\}$ and the loss function $\mathcal{L}$ for the given learning task are publicly available. First, each client locally computes the loss for each hyperparameter and selects the top-$k$ hyperparameters with the smallest loss. A voting vector $\vb{v}_i$ is created, with a $1$ on each position corresponding to the top-$k$ hyperparameters and a $0$ elsewhere. Each $\vb{v}_i$ is noised and entry-wise aggregated on server-side via secure summation to obtain $\tilde{\vb{v}}$. The server then outputs the hyperparameter with the most noisy votes.
  • Figure 2: Simulating local losses (a) and a varying number of good hyperparameters (b) to show the impact of $k$ on the accuracy of DP-Hype under varying privacy budgets $\varepsilon$ with $\delta=10^{-5}$ and $n=250$. We have a set of $100$ hyperparameters and the local loss of good and bad hyperparameters follows $\mathcal{N}(0,\sigma_{\mathrm{loss}}^2)$ and $\mathcal{N}(1,\sigma_{\mathrm{loss}}^2)$, respectively. A varying $\sigma$ creates different degrees of overlapping of both loss distributions, which uncovers the advantage of $k>1$. A varying fraction of good hyperparameters in combination with a varying $k$ reveals the underlying privacy–utility trade-off.
  • Figure 3: Depiction of the communication overhead introduced by DP-Hype in terms of voting vector length. Clients only have to send their individual voting vector of length $p=|H|$ to the server (left). The server has to process at least $n$ voting vectors (right). This is the raw communication overhead without using secure summation, which introduces additional communication, however, scaled based on what clients and the server have to process anyway.
  • Figure 4: Histogram over individual accuracy values when training a global model on each data set using SGD on the set of hyperparameters shown in \ref{['tbl::hps']}. As expected, the accuracies vary drastically depending on the data set.
  • Figure 5: Empirical results in terms of Accuracy for evaluating DP-Hype, RandGuess and Opt on the data sets MNIST, Cifar10 and Adult for a varying number of clients and different privacy budgets. It can be seen that the set of hyperparameters contains a lot of bad candidates as RandGuess is far away from Opt. Yet, DP-Hype is very close to the best possible result even for small privacy budgets and this effect gets even stronger with a growing number of clients.
  • ...and 6 more figures

Theorems & Definitions (10)

  • Definition 2.1: Differential Privacy dwork2006differential
  • Definition 2.2: Rényi Differential Privacy mironov2017renyi
  • Theorem 2.1: From RDP to DP balle2020hypothesis
  • Definition 2.3: Gaussian Mechanism mironov2017renyi
  • Lemma 4.1: L$2$-Sensitivity of $\vb{v}$
  • proof
  • Theorem 4.1
  • proof
  • Theorem 5.1: Selecting a Good Hyperparameter
  • proof