Decentralized Directed Collaboration for Personalized Federated Learning
Yingqi Liu, Yifan Shi, Qinglun Li, Baoyuan Wu, Xueqian Wang, Li Shen
TL;DR
This work tackles personalized federated learning in a decentralized setting with heterogeneous data and device resources by introducing DFedPGP, a directed, partially personalized gradient-push framework. It partition models into a shared backbone $u$ and client-specific heads $v_i$, and leverages Push-sum on a time-varying directed graph to de-bias inter-client updates, with local SGD updates for $v_i$ and gradient pushes/pulls for $u$. The authors prove a non-convex convergence rate of $\mathcal{O}(\frac{1}{\sqrt{T}})$ and show that stronger connectivity speeds convergence, while experiments on CIFAR-10/100 and Tiny-ImageNet demonstrate state-of-the-art performance under heterogeneous data and computation settings. Ablation studies confirm the benefits of both partial personalization and directed communication, and the approach offers flexible, resource-efficient decentralized collaboration for personalized learning.
Abstract
Personalized Federated Learning (PFL) is proposed to find the greatest personalized models for each client. To avoid the central failure and communication bottleneck in the server-based FL, we concentrate on the Decentralized Personalized Federated Learning (DPFL) that performs distributed model training in a Peer-to-Peer (P2P) manner. Most personalized works in DPFL are based on undirected and symmetric topologies, however, the data, computation and communication resources heterogeneity result in large variances in the personalized models, which lead the undirected aggregation to suboptimal personalized performance and unguaranteed convergence. To address these issues, we propose a directed collaboration DPFL framework by incorporating stochastic gradient push and partial model personalized, called \textbf{D}ecentralized \textbf{Fed}erated \textbf{P}artial \textbf{G}radient \textbf{P}ush (\textbf{DFedPGP}). It personalizes the linear classifier in the modern deep model to customize the local solution and learns a consensus representation in a fully decentralized manner. Clients only share gradients with a subset of neighbors based on the directed and asymmetric topologies, which guarantees flexible choices for resource efficiency and better convergence. Theoretically, we show that the proposed DFedPGP achieves a superior convergence rate of $\mathcal{O}(\frac{1}{\sqrt{T}})$ in the general non-convex setting, and prove the tighter connectivity among clients will speed up the convergence. The proposed method achieves state-of-the-art (SOTA) accuracy in both data and computation heterogeneity scenarios, demonstrating the efficiency of the directed collaboration and partial gradient push.
