Table of Contents
Fetching ...

Decentralized Directed Collaboration for Personalized Federated Learning

Yingqi Liu, Yifan Shi, Qinglun Li, Baoyuan Wu, Xueqian Wang, Li Shen

TL;DR

This work tackles personalized federated learning in a decentralized setting with heterogeneous data and device resources by introducing DFedPGP, a directed, partially personalized gradient-push framework. It partition models into a shared backbone $u$ and client-specific heads $v_i$, and leverages Push-sum on a time-varying directed graph to de-bias inter-client updates, with local SGD updates for $v_i$ and gradient pushes/pulls for $u$. The authors prove a non-convex convergence rate of $\mathcal{O}(\frac{1}{\sqrt{T}})$ and show that stronger connectivity speeds convergence, while experiments on CIFAR-10/100 and Tiny-ImageNet demonstrate state-of-the-art performance under heterogeneous data and computation settings. Ablation studies confirm the benefits of both partial personalization and directed communication, and the approach offers flexible, resource-efficient decentralized collaboration for personalized learning.

Abstract

Personalized Federated Learning (PFL) is proposed to find the greatest personalized models for each client. To avoid the central failure and communication bottleneck in the server-based FL, we concentrate on the Decentralized Personalized Federated Learning (DPFL) that performs distributed model training in a Peer-to-Peer (P2P) manner. Most personalized works in DPFL are based on undirected and symmetric topologies, however, the data, computation and communication resources heterogeneity result in large variances in the personalized models, which lead the undirected aggregation to suboptimal personalized performance and unguaranteed convergence. To address these issues, we propose a directed collaboration DPFL framework by incorporating stochastic gradient push and partial model personalized, called \textbf{D}ecentralized \textbf{Fed}erated \textbf{P}artial \textbf{G}radient \textbf{P}ush (\textbf{DFedPGP}). It personalizes the linear classifier in the modern deep model to customize the local solution and learns a consensus representation in a fully decentralized manner. Clients only share gradients with a subset of neighbors based on the directed and asymmetric topologies, which guarantees flexible choices for resource efficiency and better convergence. Theoretically, we show that the proposed DFedPGP achieves a superior convergence rate of $\mathcal{O}(\frac{1}{\sqrt{T}})$ in the general non-convex setting, and prove the tighter connectivity among clients will speed up the convergence. The proposed method achieves state-of-the-art (SOTA) accuracy in both data and computation heterogeneity scenarios, demonstrating the efficiency of the directed collaboration and partial gradient push.

Decentralized Directed Collaboration for Personalized Federated Learning

TL;DR

This work tackles personalized federated learning in a decentralized setting with heterogeneous data and device resources by introducing DFedPGP, a directed, partially personalized gradient-push framework. It partition models into a shared backbone and client-specific heads , and leverages Push-sum on a time-varying directed graph to de-bias inter-client updates, with local SGD updates for and gradient pushes/pulls for . The authors prove a non-convex convergence rate of and show that stronger connectivity speeds convergence, while experiments on CIFAR-10/100 and Tiny-ImageNet demonstrate state-of-the-art performance under heterogeneous data and computation settings. Ablation studies confirm the benefits of both partial personalization and directed communication, and the approach offers flexible, resource-efficient decentralized collaboration for personalized learning.

Abstract

Personalized Federated Learning (PFL) is proposed to find the greatest personalized models for each client. To avoid the central failure and communication bottleneck in the server-based FL, we concentrate on the Decentralized Personalized Federated Learning (DPFL) that performs distributed model training in a Peer-to-Peer (P2P) manner. Most personalized works in DPFL are based on undirected and symmetric topologies, however, the data, computation and communication resources heterogeneity result in large variances in the personalized models, which lead the undirected aggregation to suboptimal personalized performance and unguaranteed convergence. To address these issues, we propose a directed collaboration DPFL framework by incorporating stochastic gradient push and partial model personalized, called \textbf{D}ecentralized \textbf{Fed}erated \textbf{P}artial \textbf{G}radient \textbf{P}ush (\textbf{DFedPGP}). It personalizes the linear classifier in the modern deep model to customize the local solution and learns a consensus representation in a fully decentralized manner. Clients only share gradients with a subset of neighbors based on the directed and asymmetric topologies, which guarantees flexible choices for resource efficiency and better convergence. Theoretically, we show that the proposed DFedPGP achieves a superior convergence rate of in the general non-convex setting, and prove the tighter connectivity among clients will speed up the convergence. The proposed method achieves state-of-the-art (SOTA) accuracy in both data and computation heterogeneity scenarios, demonstrating the efficiency of the directed collaboration and partial gradient push.
Paper Structure (23 sections, 5 theorems, 39 equations, 5 figures, 8 tables, 1 algorithm)

This paper contains 23 sections, 5 theorems, 39 equations, 5 figures, 8 tables, 1 algorithm.

Key Result

Theorem 1

Under Assumptions 1-5, the local learning rates satisfy $0<\eta_u<\frac{\delta}{4\sqrt{2}L_uK_u}$, $F^{*}$ is denoted as the minimal value of $F$, i.e., $F(\bar{u}, V)\ge F^*$ for all $\bar{u} \in \mathbb{R}^{d}$, and $V=(v_1,\ldots,v_m)\in\mathbb{R}^{d_1+\ldots+d_m}$. Let $\bar{u}^t = \frac{1}{m}\s Therefore, we have the convergence analysis below:

Figures (5)

  • Figure 1: Test accuracy on CIFAR-10 (first line) and CIFAR-100 (second line) with heterogenous data partitions. With limited pages, we only show the training progress of the typical methods.
  • Figure 2:
  • Figure 5: Dirichlet $\alpha=0.3$ on CIFAR-100.
  • Figure 6: Pathological $c = 10$ on CIFAR-100.
  • Figure 8: Test accuracy on Tiny-ImageNet with heterogenous data partitions.

Theorems & Definitions (11)

  • Theorem 1
  • Corollary 1: Convergence Rate for DFedPGP
  • Remark 1
  • Remark 2
  • Remark 3
  • Lemma 1: Local update for personalized model $v_i$ in DFedPGP, Lemma 23 pillutla2022federated
  • proof
  • Lemma 2: Local update for shared model $u_i$ in DFedPGP
  • proof
  • Lemma 3: Mixing connectivity assran2020asynchronous
  • ...and 1 more