Federated Dropout: Convergence Analysis and Resource Allocation
Sijing Xie, Dingzhu Wen, Xiaonan Liu, Changsheng You, Tharmalingam Ratnarajah, Kaibin Huang
TL;DR
FedDrop tackles FEEL bottlenecks by training subnetworks generated via dropout, reducing per-round latency while preserving the full model at convergence. The authors prove a variance bound showing subnet gradients form a variance-bounded estimator of the full gradient with variance scaling as $\gamma/(1-\gamma)$ for dropout rate $\gamma \in [0,\theta)$. They formulate and solve a per-round joint design problem that is transformed to a convex program, yielding closed-form optimal dropout rates and bandwidth allocations with complexity $O(K^2)$. Experiments on LeNet and AlexNet trained on CIFAR-100 (IID and non-IID) demonstrate faster convergence and improved accuracy under resource constraints, with dropout acting as regularization in overfitting scenarios. The work provides a practical framework for resource-aware Fed learning and suggests extensions to LLMs and integration with compression or Air-Comp.
Abstract
Federated Dropout is an efficient technique to overcome both communication and computation bottlenecks for deploying federated learning at the network edge. In each training round, an edge device only needs to update and transmit a sub-model, which is generated by the typical method of dropout in deep learning, and thus effectively reduces the per-round latency. \textcolor{blue}{However, the theoretical convergence analysis for Federated Dropout is still lacking in the literature, particularly regarding the quantitative influence of dropout rate on convergence}. To address this issue, by using the Taylor expansion method, we mathematically show that the gradient variance increases with a scaling factor of $γ/(1-γ)$, with $γ\in [0, θ)$ denoting the dropout rate and $θ$ being the maximum dropout rate ensuring the loss function reduction. Based on the above approximation, we provide the convergence analysis for Federated Dropout. Specifically, it is shown that a larger dropout rate of each device leads to a slower convergence rate. This provides a theoretical foundation for reducing the convergence latency by making a tradeoff between the per-round latency and the overall rounds till convergence. Moreover, a low-complexity algorithm is proposed to jointly optimize the dropout rate and the bandwidth allocation for minimizing the loss function in all rounds under a given per-round latency and limited network resources. Finally, numerical results are provided to verify the effectiveness of the proposed algorithm.
