Table of Contents
Fetching ...

Smoothed Normalization for Efficient Distributed Private Optimization

Egor Shulgin, Sarit Khirirat, Peter Richtárik

TL;DR

This paper tackles the challenge of privately training non-convex, smooth models in distributed settings where gradient clipping biases hinder convergence. It introduces α-NormEC, a distributed algorithm that replaces clipping with smoothed normalization within the EF21 error-feedback framework, and extends it to differential privacy. The authors prove non-private $O\left(\frac{1}{\sqrt{K}}\right)$ convergence without bounded gradient-norm assumptions and provide DP convergence guarantees with explicit utility bounds, complemented by empirical results on CIFAR-10/ResNet20 showing robust performance across parameter settings. The work yields the first provable convergence guarantees for a DP distributed non-convex optimizer under standard assumptions and demonstrates practical advantages over DP-SGD and Clip21, with server normalization further enhancing private-training stability.

Abstract

Federated learning enables training machine learning models while preserving the privacy of participants. Surprisingly, there is no differentially private distributed method for smooth, non-convex optimization problems. The reason is that standard privacy techniques require bounding the participants' contributions, usually enforced via $\textit{clipping}$ of the updates. Existing literature typically ignores the effect of clipping by assuming the boundedness of gradient norms or analyzes distributed algorithms with clipping but ignores DP constraints. In this work, we study an alternative approach via $\textit{smoothed normalization}$ of the updates motivated by its favorable performance in the single-node setting. By integrating smoothed normalization with an error-feedback mechanism, we design a new distributed algorithm $α$-$\sf NormEC$. We prove that our method achieves a superior convergence rate over prior works. By extending $α$-$\sf NormEC$ to the DP setting, we obtain the first differentially private distributed optimization algorithm with provable convergence guarantees. Finally, our empirical results from neural network training indicate robust convergence of $α$-$\sf NormEC$ across different parameter settings.

Smoothed Normalization for Efficient Distributed Private Optimization

TL;DR

This paper tackles the challenge of privately training non-convex, smooth models in distributed settings where gradient clipping biases hinder convergence. It introduces α-NormEC, a distributed algorithm that replaces clipping with smoothed normalization within the EF21 error-feedback framework, and extends it to differential privacy. The authors prove non-private convergence without bounded gradient-norm assumptions and provide DP convergence guarantees with explicit utility bounds, complemented by empirical results on CIFAR-10/ResNet20 showing robust performance across parameter settings. The work yields the first provable convergence guarantees for a DP distributed non-convex optimizer under standard assumptions and demonstrates practical advantages over DP-SGD and Clip21, with server normalization further enhancing private-training stability.

Abstract

Federated learning enables training machine learning models while preserving the privacy of participants. Surprisingly, there is no differentially private distributed method for smooth, non-convex optimization problems. The reason is that standard privacy techniques require bounding the participants' contributions, usually enforced via of the updates. Existing literature typically ignores the effect of clipping by assuming the boundedness of gradient norms or analyzes distributed algorithms with clipping but ignores DP constraints. In this work, we study an alternative approach via of the updates motivated by its favorable performance in the single-node setting. By integrating smoothed normalization with an error-feedback mechanism, we design a new distributed algorithm -. We prove that our method achieves a superior convergence rate over prior works. By extending - to the DP setting, we obtain the first differentially private distributed optimization algorithm with provable convergence guarantees. Finally, our empirical results from neural network training indicate robust convergence of - across different parameter settings.

Paper Structure

This paper contains 55 sections, 7 theorems, 54 equations, 13 figures, 2 tables, 1 algorithm.

Key Result

Lemma 1

For any $\alpha \geq 0$, $\beta>0$, and $g\in\mathbb{R}^d$,

Figures (13)

  • Figure 1: Training loss and test accuracy of non-private $\alpha$-NormEC with $\alpha=0.01$ [solid], $0.1$ [dashed], and $1.0$ [dotted], and $\beta=0.01$ [blue], $0.1$ [green], $1.0$ [orange], and $10.0$ [red].
  • Figure 2: The highest test accuracy by $\alpha$-NormEC with different $\alpha$ and $\beta$ values.
  • Figure 3: Superior performance of $\alpha$-NormEC without server normalization [dashed] over DP-SGD\ref{['eqn:DP_biased_GD']} with smoothed normalization [solid] in the non-private setting, in terms of training loss and test accuracy for different $\beta$ values (with fine-tuned step sizes).
  • Figure 4: The highest test accuracy of DP-$\alpha$-NormEC with [left] and without [center] Server Normalization (SN), and their difference [right].
  • Figure 5: The highest test accuracy of DP-Clip21.
  • ...and 8 more figures

Theorems & Definitions (10)

  • Lemma 1
  • Example 1
  • Theorem 1: Non-private setting
  • Corollary 1: Non-private setting
  • Theorem 2: DP setting
  • Corollary 2: Utility guarantee in DP setting
  • Lemma 2: Non-private setting
  • proof
  • Lemma 3: DP setting
  • proof