Decentralized Federated Learning with Gradient Tracking over Time-Varying Directed Networks

Duong Thuy Anh Nguyen; Su Wang; Duong Tung Nguyen; Angelia Nedich; H. Vincent Poor

Decentralized Federated Learning with Gradient Tracking over Time-Varying Directed Networks

Duong Thuy Anh Nguyen, Su Wang, Duong Tung Nguyen, Angelia Nedich, H. Vincent Poor

TL;DR

This work investigates the problem of agent-to-agent interaction in decentralized (federated) learning over time-varying directed graphs and proposes a consensus-based algorithm called DSGTm-TV, which exhibits linear convergence to the exact global optimum when exact gradient information is available and converges in expectation to a neighborhood of the global optimum when employing stochastic gradients.

Abstract

We investigate the problem of agent-to-agent interaction in decentralized (federated) learning over time-varying directed graphs, and, in doing so, propose a consensus-based algorithm called DSGTm-TV. The proposed algorithm incorporates gradient tracking and heavy-ball momentum to distributively optimize a global objective function, while preserving local data privacy. Under DSGTm-TV, agents will update local model parameters and gradient estimates using information exchange with neighboring agents enabled through row- and column-stochastic mixing matrices, which we show guarantee both consensus and optimality. Our analysis establishes that DSGTm-TV exhibits linear convergence to the exact global optimum when exact gradient information is available, and converges in expectation to a neighborhood of the global optimum when employing stochastic gradients. Moreover, in contrast to existing methods, DSGTm-TV preserves convergence for networks with uncoordinated stepsizes and momentum parameters, for which we provide explicit bounds. These results enable agents to operate in a fully decentralized manner, independently optimizing their local hyper-parameters. We demonstrate the efficacy of our approach via comparisons with state-of-the-art baselines on real-world image classification and natural language processing tasks.

Decentralized Federated Learning with Gradient Tracking over Time-Varying Directed Networks

TL;DR

Abstract

Paper Structure (33 sections, 17 theorems, 127 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 33 sections, 17 theorems, 127 equations, 4 figures, 1 table, 1 algorithm.

Introduction
Literature Review
Outline and Summary of Contributions
Formulation of $\textnormal{DSGT}m-$TV
Linear convergence
Generality of $\textnormal{DSGT}m-$TV
Experimental evaluation
Notational Conventions
Problem Formulation
General Setting: Decentralized Stochastic Optimization
Assumptions
Communication Networks
The $\textnormal{DSGT}m-$TV Algorithm
Compact Form
Gradient Tracking
...and 18 more sections

Key Result

Lemma 1

Consider the iterates $\{y_k^i\}_{i\in [n], k\ge 0}$ generated by the $\textnormal{DSGT}m-$TV method in eq-met. Let Assumption asm-functions, Assumption asm-SFO and Assumption asm-bmatrices hold. For all $k\ge0$, we have

Figures (4)

Figure 1: Traditional FL has a centralized server as in Fig. \ref{['fig:FL_centralized']}. However, such a server is absent in decentralized FL, see Fig. \ref{['fig:FL_decentralized']}. We focus on the more challenging case of decentralized FL, with directed and unbalanced links among agents.
Figure 2: Our problem further assumes that directed links in decentralized FL are time-varying. As the time iteration changes, directed links among network agents may shift as a result.
Figure 3: Performance on MNIST dataset. Top: Normalized error, Middle: Training loss, Bottom: Accuracy on the test set.
Figure 4: Performance on SMS Spam Dataset. Left: Performance comparison of $\textnormal{DSGT}m-$TV versus SGD algorithms accross different settings. Right: Comparison of decentralized FL algorithms: benchmark versus $\textnormal{DSGT}m-$TV with deterministic and stochastic gradient variants.

Theorems & Definitions (40)

Remark 1
Remark 2
Remark 3
Remark 4: Generality of $\textnormal{DSGT}m-$TV
Lemma 1: pu2021stochastic
Lemma 2: nguyen2022distributed, Lemma 5.4
Lemma 3: Angelia2022AB, Lemma 3.4
Lemma 4
proof
Lemma 5
...and 30 more

Decentralized Federated Learning with Gradient Tracking over Time-Varying Directed Networks

TL;DR

Abstract

Decentralized Federated Learning with Gradient Tracking over Time-Varying Directed Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (40)