FIARSE: Model-Heterogeneous Federated Learning via Importance-Aware Submodel Extraction

Feijie Wu; Xingchen Wang; Yaqing Wang; Tianci Liu; Lu Su; Jing Gao

FIARSE: Model-Heterogeneous Federated Learning via Importance-Aware Submodel Extraction

Feijie Wu, Xingchen Wang, Yaqing Wang, Tianci Liu, Lu Su, Jing Gao

TL;DR

FIARSE addresses computation heterogeneity in federated learning by extracting client-specific submodels whose important parameters are implied by their magnitudes. It introduces Threshold-Controlled Biased Gradient Descent (TCB-GD) and magnitude-based submodel construction to train and aggregate across heterogeneous clients without maintaining per-parameter importance scores, achieving a convergence rate of $O(1/\sqrt{T})$ under standard non-convex assumptions. The approach supports partial participation, maintains log-scale memory efficiency, and demonstrates superior local and global accuracy across CIFAR-10/100 and AGNews with ResNet-18 and RoBERTa-base. The results suggest FIARSE can effectively deploy FL in real-world, resource-constrained environments while eliminating reliance on public data and extensive per-parameter storage.

Abstract

In federated learning (FL), accommodating clients' varied computational capacities poses a challenge, often limiting the participation of those with constrained resources in global model training. To address this issue, the concept of model heterogeneity through submodel extraction has emerged, offering a tailored solution that aligns the model's complexity with each client's computational capacity. In this work, we propose Federated Importance-Aware Submodel Extraction (FIARSE), a novel approach that dynamically adjusts submodels based on the importance of model parameters, thereby overcoming the limitations of previous static and dynamic submodel extraction methods. Compared to existing works, the proposed method offers a theoretical foundation for the submodel extraction and eliminates the need for additional information beyond the model parameters themselves to determine parameter importance, significantly reducing the overhead on clients. Extensive experiments are conducted on various datasets to showcase the superior performance of the proposed FIARSE.

FIARSE: Model-Heterogeneous Federated Learning via Importance-Aware Submodel Extraction

TL;DR

under standard non-convex assumptions. The approach supports partial participation, maintains log-scale memory efficiency, and demonstrates superior local and global accuracy across CIFAR-10/100 and AGNews with ResNet-18 and RoBERTa-base. The results suggest FIARSE can effectively deploy FL in real-world, resource-constrained environments while eliminating reliance on public data and extensive per-parameter storage.

Abstract

Paper Structure (50 sections, 7 theorems, 68 equations, 9 figures, 7 tables, 1 algorithm)

This paper contains 50 sections, 7 theorems, 68 equations, 9 figures, 7 tables, 1 algorithm.

Introduction
Contributions.
Related Work
Computation Heterogeneity in FL.
Model Customization in FL.
Model Sparsification in FL.
Preliminary: Model-Heterogeneous Federated Learning
Problem Formulation.
A Generic Solution: Partial Averaging.
Limitations.
$\text{FIARSE}$
Solution Overview.
Submodel Construction
Threshold-Controlled Biased Gradient Descent ($\text{TCB-GD}$).
Effectiveness.
...and 35 more sections

Key Result

Theorem 5.4

Suppose that Assumption ass:l-smooth, ass:variance and ass:mask_reduction hold. We define $F( \IfNoValueTF{}{\boldsymbol{x}_*}{\IfNoValueTF{-NoValue-}{\tilde{\boldsymbol{x}}_{}}{\IfNoValueTF{-NoValue-}{\boldsymbol{x}_{-NoValue-}^{()}}{\boldsymbol{x}_{-NoValue-, -NoValue-}^{()}}}} ) \overset{\tria Denote $T$ as the total communication rounds. Therefore, the convergence rate of $\text{FIARSE}$ fo

Figures (9)

Figure 1: Three types of submodel extraction for model training, i.e., static, dynamic, and importance-aware (ours). The figure demonstrates the global model on the server and the local models of two consecutive rounds on a client. Note that solid lines represent the parameters preserved in the local model, while dash lines indicate the parameters excluded from the local model. In importance-aware submodel extraction, we present the importance of the parameters via the line thickness .
Figure 2: Histograms of various submodel extraction methods on CIFAR-10 under four submodel sizes. Each histogram shows the number of clients achieving different levels of test accuracy.
Figure 3: Comparison of test accuracy across communication rounds for different submodel extraction strategies under four varying model sizes (1/64, 1/16, 1/4, 1.0) on global test datasets of CIFAR-10 (upper, a -- d) and CIFAR-100 (lower, e -- h).
Figure 4: Comparison of test accuracy across different submodel sizes for different submodel extraction methods on a global test dataset of CIFAR-10.
Figure 5: Histograms of various submodel extraction methods on CIFAR-10 under five submodel sizes. Each histogram shows the number of clients achieving different levels of test accuracy.
...and 4 more figures

Theorems & Definitions (13)

Theorem 5.4
Corollary 5.5
Remark 5.6
Lemma C.1
proof
Lemma C.2
proof
Lemma C.3
proof
Lemma C.4
...and 3 more

FIARSE: Model-Heterogeneous Federated Learning via Importance-Aware Submodel Extraction

TL;DR

Abstract

FIARSE: Model-Heterogeneous Federated Learning via Importance-Aware Submodel Extraction

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (13)