Smoothed Normalization for Efficient Distributed Private Optimization

Egor Shulgin; Sarit Khirirat; Peter Richtárik

Smoothed Normalization for Efficient Distributed Private Optimization

Egor Shulgin, Sarit Khirirat, Peter Richtárik

TL;DR

This paper tackles the challenge of privately training non-convex, smooth models in distributed settings where gradient clipping biases hinder convergence. It introduces α-NormEC, a distributed algorithm that replaces clipping with smoothed normalization within the EF21 error-feedback framework, and extends it to differential privacy. The authors prove non-private $O\left(\frac{1}{\sqrt{K}}\right)$ convergence without bounded gradient-norm assumptions and provide DP convergence guarantees with explicit utility bounds, complemented by empirical results on CIFAR-10/ResNet20 showing robust performance across parameter settings. The work yields the first provable convergence guarantees for a DP distributed non-convex optimizer under standard assumptions and demonstrates practical advantages over DP-SGD and Clip21, with server normalization further enhancing private-training stability.

Abstract

Federated learning enables training machine learning models while preserving the privacy of participants. Surprisingly, there is no differentially private distributed method for smooth, non-convex optimization problems. The reason is that standard privacy techniques require bounding the participants' contributions, usually enforced via $\textit{clipping}$ of the updates. Existing literature typically ignores the effect of clipping by assuming the boundedness of gradient norms or analyzes distributed algorithms with clipping but ignores DP constraints. In this work, we study an alternative approach via $\textit{smoothed normalization}$ of the updates motivated by its favorable performance in the single-node setting. By integrating smoothed normalization with an error-feedback mechanism, we design a new distributed algorithm $α$-$\sf NormEC$. We prove that our method achieves a superior convergence rate over prior works. By extending $α$-$\sf NormEC$ to the DP setting, we obtain the first differentially private distributed optimization algorithm with provable convergence guarantees. Finally, our empirical results from neural network training indicate robust convergence of $α$-$\sf NormEC$ across different parameter settings.

Smoothed Normalization for Efficient Distributed Private Optimization

TL;DR

convergence without bounded gradient-norm assumptions and provide DP convergence guarantees with explicit utility bounds, complemented by empirical results on CIFAR-10/ResNet20 showing robust performance across parameter settings. The work yields the first provable convergence guarantees for a DP distributed non-convex optimizer under standard assumptions and demonstrates practical advantages over DP-SGD and Clip21, with server normalization further enhancing private-training stability.

Abstract

of the updates. Existing literature typically ignores the effect of clipping by assuming the boundedness of gradient norms or analyzes distributed algorithms with clipping but ignores DP constraints. In this work, we study an alternative approach via

of the updates motivated by its favorable performance in the single-node setting. By integrating smoothed normalization with an error-feedback mechanism, we design a new distributed algorithm

. We prove that our method achieves a superior convergence rate over prior works. By extending

to the DP setting, we obtain the first differentially private distributed optimization algorithm with provable convergence guarantees. Finally, our empirical results from neural network training indicate robust convergence of

across different parameter settings.

Smoothed Normalization for Efficient Distributed Private Optimization

TL;DR

Abstract

Smoothed Normalization for Efficient Distributed Private Optimization

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (10)