Table of Contents
Fetching ...

FedGMR: Federated Learning with Gradual Model Restoration under Asynchrony and Model Heterogeneity

Chengjie Ma, Seungeun Oh, Jihong Park, Seong-Lyun Kim

TL;DR

FedGMR tackles late-stage capacity bottlenecks in model-heterogeneous federated learning by gradually restoring sub-model density for bandwidth-constrained clients. It combines a two-stage density strategy with a mask-aware, buffering aggregation to maintain stable updates under asynchrony and evolving model structures, supported by convergence guarantees. The approach is validated on FEMNIST, CIFAR-10, and ImageNet-100, showing faster convergence and higher accuracy than baselines, especially under high heterogeneity and non-IID distributions. The work includes a thorough theoretical analysis of convergence under mask-aware aggregation and extensive ablations demonstrating the core role of dynamic restoration and robust aggregation in achieving robust performance. Overall, FedGMR provides a practical, theoretically-grounded framework for leveraging heterogeneous client capabilities in large-scale FL without sacrificing convergence or performance.

Abstract

Federated learning (FL) holds strong potential for distributed machine learning, but in heterogeneous environments, Bandwidth-Constrained Clients (BCCs) often struggle to participate effectively due to limited communication capacity. Their small sub-models learn quickly at first but become under-parameterized in later stages, leading to slow convergence and degraded generalization. We propose FedGMR - Federated Learning with Gradual Model Restoration under Asynchrony and Model Heterogeneity. FedGMR progressively increases each client's sub-model density during training, enabling BCCs to remain effective contributors throughout the process. In addition, we develop a mask-aware aggregation rule tailored for asynchronous MHFL and provide convergence guarantees showing that aggregated error scales with the average sub-model density across clients and rounds, while GMR provably shrinks this gap toward full-model FL. Extensive experiments on FEMNIST, CIFAR-10, and ImageNet-100 demonstrate that FedGMR achieves faster convergence and higher accuracy, especially under high heterogeneity and non-IID settings.

FedGMR: Federated Learning with Gradual Model Restoration under Asynchrony and Model Heterogeneity

TL;DR

FedGMR tackles late-stage capacity bottlenecks in model-heterogeneous federated learning by gradually restoring sub-model density for bandwidth-constrained clients. It combines a two-stage density strategy with a mask-aware, buffering aggregation to maintain stable updates under asynchrony and evolving model structures, supported by convergence guarantees. The approach is validated on FEMNIST, CIFAR-10, and ImageNet-100, showing faster convergence and higher accuracy than baselines, especially under high heterogeneity and non-IID distributions. The work includes a thorough theoretical analysis of convergence under mask-aware aggregation and extensive ablations demonstrating the core role of dynamic restoration and robust aggregation in achieving robust performance. Overall, FedGMR provides a practical, theoretically-grounded framework for leveraging heterogeneous client capabilities in large-scale FL without sacrificing convergence or performance.

Abstract

Federated learning (FL) holds strong potential for distributed machine learning, but in heterogeneous environments, Bandwidth-Constrained Clients (BCCs) often struggle to participate effectively due to limited communication capacity. Their small sub-models learn quickly at first but become under-parameterized in later stages, leading to slow convergence and degraded generalization. We propose FedGMR - Federated Learning with Gradual Model Restoration under Asynchrony and Model Heterogeneity. FedGMR progressively increases each client's sub-model density during training, enabling BCCs to remain effective contributors throughout the process. In addition, we develop a mask-aware aggregation rule tailored for asynchronous MHFL and provide convergence guarantees showing that aggregated error scales with the average sub-model density across clients and rounds, while GMR provably shrinks this gap toward full-model FL. Extensive experiments on FEMNIST, CIFAR-10, and ImageNet-100 demonstrate that FedGMR achieves faster convergence and higher accuracy, especially under high heterogeneity and non-IID settings.

Paper Structure

This paper contains 61 sections, 5 theorems, 97 equations, 3 figures, 11 tables, 5 algorithms.

Key Result

Lemma 1

Restoring Model Capacity Improves Training Speed Post-Plateau

Figures (3)

  • Figure 1: GMR: client models are heterogeneous but are gradually restored during training.
  • Figure 2: The accuracy growth rate with different model densities.
  • Figure 3: Training dynamics under heterogeneous bandwidth: top row shows training steps over time, bottom row shows accuracy over time, for FEMNIST, CIFAR-10, and ImageNet-100.

Theorems & Definitions (5)

  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5