Table of Contents
Fetching ...

QMGeo: Differentially Private Federated Learning via Stochastic Quantization with Mixed Truncated Geometric Distribution

Zixi Wang, M. Cenk Gursoy

TL;DR

QMGeo introduces a noise-free differential privacy mechanism for federated learning through a stochastic quantization scheme based on a mixed truncated geometric distribution. By quantizing scalar updates with a carefully designed probability over $R$ levels, the method delivers $\epsilon$-DP and Rényi DP (RDP) guarantees while maintaining comparable accuracy to unquantized baselines, thereby improving communication efficiency without additional noise. The work provides per-dimension and multi-dimensional DP analyses, an optimality-gap bound under standard smoothness and PL assumptions, and empirical validation on MNIST showing favorable privacy-utility trade-offs. This approach offers a practical avenue for privacy-preserving FL in resource-constrained environments, leveraging quantization as an intrinsic privacy mechanism rather than a mere compression step.

Abstract

Federated learning (FL) is a framework which allows multiple users to jointly train a global machine learning (ML) model by transmitting only model updates under the coordination of a parameter server, while being able to keep their datasets local. One key motivation of such distributed frameworks is to provide privacy guarantees to the users. However, preserving the users' datasets locally is shown to be not sufficient for privacy. Several differential privacy (DP) mechanisms have been proposed to provide provable privacy guarantees by introducing randomness into the framework, and majority of these mechanisms rely on injecting additive noise. FL frameworks also face the challenge of communication efficiency, especially as machine learning models grow in complexity and size. Quantization is a commonly utilized method, reducing the communication cost by transmitting compressed representation of the underlying information. Although there have been several studies on DP and quantization in FL, the potential contribution of the quantization method alone in providing privacy guarantees has not been extensively analyzed yet. We in this paper present a novel stochastic quantization method, utilizing a mixed geometric distribution to introduce the randomness needed to provide DP, without any additive noise. We provide convergence analysis for our framework and empirically study its performance.

QMGeo: Differentially Private Federated Learning via Stochastic Quantization with Mixed Truncated Geometric Distribution

TL;DR

QMGeo introduces a noise-free differential privacy mechanism for federated learning through a stochastic quantization scheme based on a mixed truncated geometric distribution. By quantizing scalar updates with a carefully designed probability over levels, the method delivers -DP and Rényi DP (RDP) guarantees while maintaining comparable accuracy to unquantized baselines, thereby improving communication efficiency without additional noise. The work provides per-dimension and multi-dimensional DP analyses, an optimality-gap bound under standard smoothness and PL assumptions, and empirical validation on MNIST showing favorable privacy-utility trade-offs. This approach offers a practical avenue for privacy-preserving FL in resource-constrained environments, leveraging quantization as an intrinsic privacy mechanism rather than a mere compression step.

Abstract

Federated learning (FL) is a framework which allows multiple users to jointly train a global machine learning (ML) model by transmitting only model updates under the coordination of a parameter server, while being able to keep their datasets local. One key motivation of such distributed frameworks is to provide privacy guarantees to the users. However, preserving the users' datasets locally is shown to be not sufficient for privacy. Several differential privacy (DP) mechanisms have been proposed to provide provable privacy guarantees by introducing randomness into the framework, and majority of these mechanisms rely on injecting additive noise. FL frameworks also face the challenge of communication efficiency, especially as machine learning models grow in complexity and size. Quantization is a commonly utilized method, reducing the communication cost by transmitting compressed representation of the underlying information. Although there have been several studies on DP and quantization in FL, the potential contribution of the quantization method alone in providing privacy guarantees has not been extensively analyzed yet. We in this paper present a novel stochastic quantization method, utilizing a mixed geometric distribution to introduce the randomness needed to provide DP, without any additive noise. We provide convergence analysis for our framework and empirically study its performance.
Paper Structure (20 sections, 30 equations, 6 figures)

This paper contains 20 sections, 30 equations, 6 figures.

Figures (6)

  • Figure 1: How $\mathcal{Q}_{\text{MGeo}}(\cdot)$ quantizes a scalar input value $w$ is demonstrated in this figure. The dotted line shows the probability of the input value $w$ being quantized to the corresponding quantization level. As shown in the figure, each quantization level is assigned with a non-zero probability. The larger $p$ is, the more skewed the distribution becomes.
  • Figure 2: This figure illustrates an example of the stochastic $k$-level quantization failing to provide DP. The solid orange and blue lines are the possible output quantization levels for $g$ and $g^{'}$ respectively. It is obvious that, by observing the output of the quantization, any adversary could distinguish betwen $g$ and $g^{'}$. Any grey dotted lines are quantization levels that lies neither in the range of $Q(g)$ or $Q(g^{'})$, thus any of them being chosen as $v$ would render the $\epsilon$ unbounded.
  • Figure 3: $\epsilon$-DP as a function of the parameter $p$ of $Q_\text{QMGeo}(\cdot)$, the success rate of the mixed truncated geometric distribution. Each curve displayed in this figure corresponds to a different number of quantization levels.
  • Figure 4: Plot of $\epsilon(\alpha)$ vs. $\alpha$. The y-axis is the achieved $\epsilon$, and the x-axis is the corresponding $\alpha$. The curve is obtained with a $\mathcal{Q}_{\text{MGeo}}(\cdot)$ with $R=8$$p=0.5$.
  • Figure 5: The two solid line curves displayed in this figure correspond to $p=0.5$ and $p=0.9$ in $Q_\text{QMGeo}(\cdot)$ respectively, with the number of quantization levels $R=8$. The dotted curve is the baseline curve where we do not apply any quantization. The y-axis shows the accuracy obtained at the global model using the hold-out test set, while the x-axis shows the number of communication rounds. The $\epsilon$ labeled in the legend is acquired using (\ref{['eq:RDP-multiD']}), fixing $\alpha=2$.
  • ...and 1 more figures