Federated Dropout: Convergence Analysis and Resource Allocation

Sijing Xie; Dingzhu Wen; Xiaonan Liu; Changsheng You; Tharmalingam Ratnarajah; Kaibin Huang

Federated Dropout: Convergence Analysis and Resource Allocation

Sijing Xie, Dingzhu Wen, Xiaonan Liu, Changsheng You, Tharmalingam Ratnarajah, Kaibin Huang

TL;DR

FedDrop tackles FEEL bottlenecks by training subnetworks generated via dropout, reducing per-round latency while preserving the full model at convergence. The authors prove a variance bound showing subnet gradients form a variance-bounded estimator of the full gradient with variance scaling as $\gamma/(1-\gamma)$ for dropout rate $\gamma \in [0,\theta)$. They formulate and solve a per-round joint design problem that is transformed to a convex program, yielding closed-form optimal dropout rates and bandwidth allocations with complexity $O(K^2)$. Experiments on LeNet and AlexNet trained on CIFAR-100 (IID and non-IID) demonstrate faster convergence and improved accuracy under resource constraints, with dropout acting as regularization in overfitting scenarios. The work provides a practical framework for resource-aware Fed learning and suggests extensions to LLMs and integration with compression or Air-Comp.

Abstract

Federated Dropout is an efficient technique to overcome both communication and computation bottlenecks for deploying federated learning at the network edge. In each training round, an edge device only needs to update and transmit a sub-model, which is generated by the typical method of dropout in deep learning, and thus effectively reduces the per-round latency. \textcolor{blue}{However, the theoretical convergence analysis for Federated Dropout is still lacking in the literature, particularly regarding the quantitative influence of dropout rate on convergence}. To address this issue, by using the Taylor expansion method, we mathematically show that the gradient variance increases with a scaling factor of $γ/(1-γ)$, with $γ\in [0, θ)$ denoting the dropout rate and $θ$ being the maximum dropout rate ensuring the loss function reduction. Based on the above approximation, we provide the convergence analysis for Federated Dropout. Specifically, it is shown that a larger dropout rate of each device leads to a slower convergence rate. This provides a theoretical foundation for reducing the convergence latency by making a tradeoff between the per-round latency and the overall rounds till convergence. Moreover, a low-complexity algorithm is proposed to jointly optimize the dropout rate and the bandwidth allocation for minimizing the loss function in all rounds under a given per-round latency and limited network resources. Finally, numerical results are provided to verify the effectiveness of the proposed algorithm.

Federated Dropout: Convergence Analysis and Resource Allocation

TL;DR

for dropout rate

. They formulate and solve a per-round joint design problem that is transformed to a convex program, yielding closed-form optimal dropout rates and bandwidth allocations with complexity

. Experiments on LeNet and AlexNet trained on CIFAR-100 (IID and non-IID) demonstrate faster convergence and improved accuracy under resource constraints, with dropout acting as regularization in overfitting scenarios. The work provides a practical framework for resource-aware Fed learning and suggests extensions to LLMs and integration with compression or Air-Comp.

Abstract

, with

denoting the dropout rate and

being the maximum dropout rate ensuring the loss function reduction. Based on the above approximation, we provide the convergence analysis for Federated Dropout. Specifically, it is shown that a larger dropout rate of each device leads to a slower convergence rate. This provides a theoretical foundation for reducing the convergence latency by making a tradeoff between the per-round latency and the overall rounds till convergence. Moreover, a low-complexity algorithm is proposed to jointly optimize the dropout rate and the bandwidth allocation for minimizing the loss function in all rounds under a given per-round latency and limited network resources. Finally, numerical results are provided to verify the effectiveness of the proposed algorithm.

Paper Structure (39 sections, 4 theorems, 60 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 39 sections, 4 theorems, 60 equations, 6 figures, 2 tables, 1 algorithm.

Introduction
System Model
Network Model
Federated Learning Model
Dropout
FedDrop Framework
Generation (Subnets Generation)
Push (Model Downloading)
Computation (Local Model Updating)
Pull (Local Model Uploading)
Aggregation (Global Aggregation and Updating)
Latency and Energy Consumption Models
Generation Step
Push Step
Computation Step
...and 24 more sections

Key Result

Lemma 1

Under Assumptions assumption1, assumption2, and assumption3, the gradient vector of a subnet is a variance-bounded estimation of the whole network's gradient vector:

Figures (6)

Figure 1: The operations of FL with FedDrop in a wireless system.
Figure 2: The Dirichlet distribution of data on all the clients.
Figure 3: Effects of per-round latency on convergence round in underfitting and overfitting scenarios, respectively.
Figure 4: Effects of system bandwidth on convergence round in underfitting and overfitting scenarios, respectively.
Figure 5: Effects of per-round latency on testing accuracy in underfitting and overfitting scenarios, respectively.
...and 1 more figures

Theorems & Definitions (9)

Remark 1: Communication and Computational Overhead
Lemma 1
proof
Lemma 2
proof
Theorem 1
proof
Lemma 3
proof

Federated Dropout: Convergence Analysis and Resource Allocation

TL;DR

Abstract

Federated Dropout: Convergence Analysis and Resource Allocation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (9)