Table of Contents
Fetching ...

Decentralized Federated Learning with Gradient Tracking over Time-Varying Directed Networks

Duong Thuy Anh Nguyen, Su Wang, Duong Tung Nguyen, Angelia Nedich, H. Vincent Poor

TL;DR

This work investigates the problem of agent-to-agent interaction in decentralized (federated) learning over time-varying directed graphs and proposes a consensus-based algorithm called DSGTm-TV, which exhibits linear convergence to the exact global optimum when exact gradient information is available and converges in expectation to a neighborhood of the global optimum when employing stochastic gradients.

Abstract

We investigate the problem of agent-to-agent interaction in decentralized (federated) learning over time-varying directed graphs, and, in doing so, propose a consensus-based algorithm called DSGTm-TV. The proposed algorithm incorporates gradient tracking and heavy-ball momentum to distributively optimize a global objective function, while preserving local data privacy. Under DSGTm-TV, agents will update local model parameters and gradient estimates using information exchange with neighboring agents enabled through row- and column-stochastic mixing matrices, which we show guarantee both consensus and optimality. Our analysis establishes that DSGTm-TV exhibits linear convergence to the exact global optimum when exact gradient information is available, and converges in expectation to a neighborhood of the global optimum when employing stochastic gradients. Moreover, in contrast to existing methods, DSGTm-TV preserves convergence for networks with uncoordinated stepsizes and momentum parameters, for which we provide explicit bounds. These results enable agents to operate in a fully decentralized manner, independently optimizing their local hyper-parameters. We demonstrate the efficacy of our approach via comparisons with state-of-the-art baselines on real-world image classification and natural language processing tasks.

Decentralized Federated Learning with Gradient Tracking over Time-Varying Directed Networks

TL;DR

This work investigates the problem of agent-to-agent interaction in decentralized (federated) learning over time-varying directed graphs and proposes a consensus-based algorithm called DSGTm-TV, which exhibits linear convergence to the exact global optimum when exact gradient information is available and converges in expectation to a neighborhood of the global optimum when employing stochastic gradients.

Abstract

We investigate the problem of agent-to-agent interaction in decentralized (federated) learning over time-varying directed graphs, and, in doing so, propose a consensus-based algorithm called DSGTm-TV. The proposed algorithm incorporates gradient tracking and heavy-ball momentum to distributively optimize a global objective function, while preserving local data privacy. Under DSGTm-TV, agents will update local model parameters and gradient estimates using information exchange with neighboring agents enabled through row- and column-stochastic mixing matrices, which we show guarantee both consensus and optimality. Our analysis establishes that DSGTm-TV exhibits linear convergence to the exact global optimum when exact gradient information is available, and converges in expectation to a neighborhood of the global optimum when employing stochastic gradients. Moreover, in contrast to existing methods, DSGTm-TV preserves convergence for networks with uncoordinated stepsizes and momentum parameters, for which we provide explicit bounds. These results enable agents to operate in a fully decentralized manner, independently optimizing their local hyper-parameters. We demonstrate the efficacy of our approach via comparisons with state-of-the-art baselines on real-world image classification and natural language processing tasks.
Paper Structure (33 sections, 17 theorems, 127 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 33 sections, 17 theorems, 127 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

Consider the iterates $\{y_k^i\}_{i\in [n], k\ge 0}$ generated by the $\textnormal{DSGT}m-$TV method in eq-met. Let Assumption asm-functions, Assumption asm-SFO and Assumption asm-bmatrices hold. For all $k\ge0$, we have

Figures (4)

  • Figure 1: Traditional FL has a centralized server as in Fig. \ref{['fig:FL_centralized']}. However, such a server is absent in decentralized FL, see Fig. \ref{['fig:FL_decentralized']}. We focus on the more challenging case of decentralized FL, with directed and unbalanced links among agents.
  • Figure 2: Our problem further assumes that directed links in decentralized FL are time-varying. As the time iteration changes, directed links among network agents may shift as a result.
  • Figure 3: Performance on MNIST dataset. Top: Normalized error, Middle: Training loss, Bottom: Accuracy on the test set.
  • Figure 4: Performance on SMS Spam Dataset. Left: Performance comparison of $\textnormal{DSGT}m-$TV versus SGD algorithms accross different settings. Right: Comparison of decentralized FL algorithms: benchmark versus $\textnormal{DSGT}m-$TV with deterministic and stochastic gradient variants.

Theorems & Definitions (40)

  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4: Generality of $\textnormal{DSGT}m-$TV
  • Lemma 1: pu2021stochastic
  • Lemma 2: nguyen2022distributed, Lemma 5.4
  • Lemma 3: Angelia2022AB, Lemma 3.4
  • Lemma 4
  • proof
  • Lemma 5
  • ...and 30 more