Topology Learning for Heterogeneous Decentralized Federated Learning over Unreliable D2D Networks
Zheshun Wu, Zenglin Xu, Dun Zeng, Junfan Li, Jie Liu
TL;DR
This work addresses decentralized federated learning (DFL) over unreliable UDP-based device-to-device (D2D) networks with heterogeneous data distributions. It derives a convergence bound that introduces the unreliable-links-aware neighborhood discrepancy $\bar{H}$ and proposes ToLRDUL, a topology-learning method that minimizes $\bar{H}$ by jointly considering representation discrepancy and link outages via a Frank-Wolfe optimization over a sparse set of topologies. The approach uses Gaussian representations to approximate gradient discrepancies and exchanges compact encrypted statistics every $K$ rounds to reduce communication. Empirical results on Dirichlet and Rotated CIFAR-10 demonstrate faster convergence and higher test accuracy with ToLRDUL, while achieving lower latency than baselines, validating the theoretical claims and practical impact for robust DFL in unreliable D2D environments.
Abstract
With the proliferation of intelligent mobile devices in wireless device-to-device (D2D) networks, decentralized federated learning (DFL) has attracted significant interest. Compared to centralized federated learning (CFL), DFL mitigates the risk of central server failures due to communication bottlenecks. However, DFL faces several challenges, such as the severe heterogeneity of data distributions in diverse environments, and the transmission outages and package errors caused by the adoption of the User Datagram Protocol (UDP) in D2D networks. These challenges often degrade the convergence of training DFL models. To address these challenges, we conduct a thorough theoretical convergence analysis for DFL and derive a convergence bound. By defining a novel quantity named unreliable links-aware neighborhood discrepancy in this convergence bound, we formulate a tractable optimization objective, and develop a novel Topology Learning method considering the Representation Discrepancy and Unreliable Links in DFL, named ToLRDUL. Intensive experiments under both feature skew and label skew settings have validated the effectiveness of our proposed method, demonstrating improved convergence speed and test accuracy, consistent with our theoretical findings.
