Table of Contents
Fetching ...

A-FedPD: Aligning Dual-Drift is All Federated Primal-Dual Learning Needs

Yan Sun, Li Shen, Dacheng Tao

TL;DR

A novel Aligned Federated Primal Dual (A-FedPD) method is proposed, which constructs virtual dual updates to align global consensus and local dual variables for those protracted unparticipated local clients.

Abstract

As a popular paradigm for juggling data privacy and collaborative training, federated learning (FL) is flourishing to distributively process the large scale of heterogeneous datasets on edged clients. Due to bandwidth limitations and security considerations, it ingeniously splits the original problem into multiple subproblems to be solved in parallel, which empowers primal dual solutions to great application values in FL. In this paper, we review the recent development of classical federated primal dual methods and point out a serious common defect of such methods in non-convex scenarios, which we say is a "dual drift" caused by dual hysteresis of those longstanding inactive clients under partial participation training. To further address this problem, we propose a novel Aligned Federated Primal Dual (A-FedPD) method, which constructs virtual dual updates to align global consensus and local dual variables for those protracted unparticipated local clients. Meanwhile, we provide a comprehensive analysis of the optimization and generalization efficiency for the A-FedPD method on smooth non-convex objectives, which confirms its high efficiency and practicality. Extensive experiments are conducted on several classical FL setups to validate the effectiveness of our proposed method.

A-FedPD: Aligning Dual-Drift is All Federated Primal-Dual Learning Needs

TL;DR

A novel Aligned Federated Primal Dual (A-FedPD) method is proposed, which constructs virtual dual updates to align global consensus and local dual variables for those protracted unparticipated local clients.

Abstract

As a popular paradigm for juggling data privacy and collaborative training, federated learning (FL) is flourishing to distributively process the large scale of heterogeneous datasets on edged clients. Due to bandwidth limitations and security considerations, it ingeniously splits the original problem into multiple subproblems to be solved in parallel, which empowers primal dual solutions to great application values in FL. In this paper, we review the recent development of classical federated primal dual methods and point out a serious common defect of such methods in non-convex scenarios, which we say is a "dual drift" caused by dual hysteresis of those longstanding inactive clients under partial participation training. To further address this problem, we propose a novel Aligned Federated Primal Dual (A-FedPD) method, which constructs virtual dual updates to align global consensus and local dual variables for those protracted unparticipated local clients. Meanwhile, we provide a comprehensive analysis of the optimization and generalization efficiency for the A-FedPD method on smooth non-convex objectives, which confirms its high efficiency and practicality. Extensive experiments are conducted on several classical FL setups to validate the effectiveness of our proposed method.
Paper Structure (33 sections, 8 theorems, 55 equations, 7 figures, 9 tables, 2 algorithms)

This paper contains 33 sections, 8 theorems, 55 equations, 7 figures, 9 tables, 2 algorithms.

Key Result

Theorem 1

Let non-convex objective $f$ satisfies Assumption as:smoothness, let $\rho$ be selected as a non-zero positive constant, $\{\overline{\theta}^t\}_{t=0}^{T}$ sequence generated by algorithm algorithm satisfies: where $f^\star$ is the optimum and $R_{0} = \frac{1}{C}\sum_{i\in\mathcal{C}}\mathbb{E}_t\Vert\theta_i^{1} - \theta^{0}\Vert^2$ is the first local training volumes.

Figures (7)

  • Figure 1: "Dual drift" issue of the federated primal dual method under different participation ratios. When the participation ratio is low, dual drift introduces a very large variance, yielding divergence.
  • Figure 2: Test of the proposed A-FedPD method on setups of different participation ratios, different local intervals, and different rounds. In these experiments, we fix the total training data samples and total training iterations and then learn their variation trends.
  • Figure 3: Wall-clock time test of training process after total of 600 communication rounds.
  • Figure 4: Label ratios under different splitting manners. Different color means the samples are in different labels. We show the different splitting distributions on a total of 100 clients.
  • Figure 5: Introducing the brightness biases to different clients. We calculate the average brightness to control each sample to a proper state. Each client will randomly sample a Gaussian noise to perturb the local samples.
  • ...and 2 more figures

Theorems & Definitions (11)

  • Theorem 1
  • Remark 1.1
  • Remark 1.2
  • Theorem 2
  • Remark 2.1
  • Lemma 1: acar2021federated
  • Lemma 2
  • Lemma 3
  • Lemma 4: hardt2016train
  • Lemma 5
  • ...and 1 more