Taming Subnet-Drift in D2D-Enabled Fog Learning: A Hierarchical Gradient Tracking Approach
Evan Chen, Shiqiang Wang, Christopher G. Brinton
TL;DR
This work tackles subnet-drift in SD-FL by introducing Semi-Decentralized Gradient Tracking (SD-GT), which employs dual gradient-tracking terms to stabilize updates across the D2D and DS communication layers over two timescales. It provides Lyapunov-based convergence bounds for both non-convex and strongly convex objectives and proposes a co-optimization framework to trade learning speed against communication cost via subnet sampling and D2D rounds. Theoretical results are corroborated by extensive experiments on real-world and synthetic datasets, showing substantial improvements in model quality and communication efficiency over SD-FL and gradient-tracking baselines. The approach enables robust, scalable fog-learning deployments with tunable efficiency suitable for heterogeneous networks.
Abstract
Federated learning (FL) encounters scalability challenges when implemented over fog networks. Semi-decentralized FL (SD-FL) proposes a solution that divides model cooperation into two stages: at the lower stage, device-to-device (D2D) communications is employed for local model aggregations within subnetworks (subnets), while the upper stage handles device-server (DS) communications for global model aggregations. However, existing SD-FL schemes are based on gradient diversity assumptions that become performance bottlenecks as data distributions become more heterogeneous. In this work, we develop semi-decentralized gradient tracking (SD-GT), the first SD-FL methodology that removes the need for such assumptions by incorporating tracking terms into device updates for each communication layer. Analytical characterization of SD-GT reveals convergence upper bounds for both non-convex and strongly-convex problems, for a suitable choice of step size. We employ the resulting bounds in the development of a co-optimization algorithm for optimizing subnet sampling rates and D2D rounds according to a performance-efficiency trade-off. Our subsequent numerical evaluations demonstrate that SD-GT obtains substantial improvements in trained model quality and communication cost relative to baselines in SD-FL and gradient tracking on several datasets.
