Decentralized Directed Collaboration for Personalized Federated Learning

Yingqi Liu; Yifan Shi; Qinglun Li; Baoyuan Wu; Xueqian Wang; Li Shen

Decentralized Directed Collaboration for Personalized Federated Learning

Yingqi Liu, Yifan Shi, Qinglun Li, Baoyuan Wu, Xueqian Wang, Li Shen

TL;DR

This work tackles personalized federated learning in a decentralized setting with heterogeneous data and device resources by introducing DFedPGP, a directed, partially personalized gradient-push framework. It partition models into a shared backbone $u$ and client-specific heads $v_i$, and leverages Push-sum on a time-varying directed graph to de-bias inter-client updates, with local SGD updates for $v_i$ and gradient pushes/pulls for $u$. The authors prove a non-convex convergence rate of $\mathcal{O}(\frac{1}{\sqrt{T}})$ and show that stronger connectivity speeds convergence, while experiments on CIFAR-10/100 and Tiny-ImageNet demonstrate state-of-the-art performance under heterogeneous data and computation settings. Ablation studies confirm the benefits of both partial personalization and directed communication, and the approach offers flexible, resource-efficient decentralized collaboration for personalized learning.

Abstract

Personalized Federated Learning (PFL) is proposed to find the greatest personalized models for each client. To avoid the central failure and communication bottleneck in the server-based FL, we concentrate on the Decentralized Personalized Federated Learning (DPFL) that performs distributed model training in a Peer-to-Peer (P2P) manner. Most personalized works in DPFL are based on undirected and symmetric topologies, however, the data, computation and communication resources heterogeneity result in large variances in the personalized models, which lead the undirected aggregation to suboptimal personalized performance and unguaranteed convergence. To address these issues, we propose a directed collaboration DPFL framework by incorporating stochastic gradient push and partial model personalized, called \textbf{D}ecentralized \textbf{Fed}erated \textbf{P}artial \textbf{G}radient \textbf{P}ush (\textbf{DFedPGP}). It personalizes the linear classifier in the modern deep model to customize the local solution and learns a consensus representation in a fully decentralized manner. Clients only share gradients with a subset of neighbors based on the directed and asymmetric topologies, which guarantees flexible choices for resource efficiency and better convergence. Theoretically, we show that the proposed DFedPGP achieves a superior convergence rate of $\mathcal{O}(\frac{1}{\sqrt{T}})$ in the general non-convex setting, and prove the tighter connectivity among clients will speed up the convergence. The proposed method achieves state-of-the-art (SOTA) accuracy in both data and computation heterogeneity scenarios, demonstrating the efficiency of the directed collaboration and partial gradient push.

Decentralized Directed Collaboration for Personalized Federated Learning

TL;DR

and client-specific heads

, and leverages Push-sum on a time-varying directed graph to de-bias inter-client updates, with local SGD updates for

and gradient pushes/pulls for

. The authors prove a non-convex convergence rate of

and show that stronger connectivity speeds convergence, while experiments on CIFAR-10/100 and Tiny-ImageNet demonstrate state-of-the-art performance under heterogeneous data and computation settings. Ablation studies confirm the benefits of both partial personalization and directed communication, and the approach offers flexible, resource-efficient decentralized collaboration for personalized learning.

Abstract

in the general non-convex setting, and prove the tighter connectivity among clients will speed up the convergence. The proposed method achieves state-of-the-art (SOTA) accuracy in both data and computation heterogeneity scenarios, demonstrating the efficiency of the directed collaboration and partial gradient push.

Paper Structure (23 sections, 5 theorems, 39 equations, 5 figures, 8 tables, 1 algorithm)

This paper contains 23 sections, 5 theorems, 39 equations, 5 figures, 8 tables, 1 algorithm.

Related Work
Methodology
Problem Setup
Algorithm
Theoretical Analysis
Assumption
Challenge and Proof
Experiments
Experiment Setup
Performance Evaluation
Ablation Study
Conclusion
More Details in the Related Works
More details in the client selection
More details in the experiments
...and 8 more sections

Key Result

Theorem 1

Under Assumptions 1-5, the local learning rates satisfy $0<\eta_u<\frac{\delta}{4\sqrt{2}L_uK_u}$, $F^{*}$ is denoted as the minimal value of $F$, i.e., $F(\bar{u}, V)\ge F^*$ for all $\bar{u} \in \mathbb{R}^{d}$, and $V=(v_1,\ldots,v_m)\in\mathbb{R}^{d_1+\ldots+d_m}$. Let $\bar{u}^t = \frac{1}{m}\s Therefore, we have the convergence analysis below:

Figures (5)

Figure 1: Test accuracy on CIFAR-10 (first line) and CIFAR-100 (second line) with heterogenous data partitions. With limited pages, we only show the training progress of the typical methods.
Figure 2:
Figure 5: Dirichlet $\alpha=0.3$ on CIFAR-100.
Figure 6: Pathological $c = 10$ on CIFAR-100.
Figure 8: Test accuracy on Tiny-ImageNet with heterogenous data partitions.

Theorems & Definitions (11)

Theorem 1
Corollary 1: Convergence Rate for DFedPGP
Remark 1
Remark 2
Remark 3
Lemma 1: Local update for personalized model $v_i$ in DFedPGP, Lemma 23 pillutla2022federated
proof
Lemma 2: Local update for shared model $u_i$ in DFedPGP
proof
Lemma 3: Mixing connectivity assran2020asynchronous
...and 1 more

Decentralized Directed Collaboration for Personalized Federated Learning

TL;DR

Abstract

Decentralized Directed Collaboration for Personalized Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (11)