Table of Contents
Fetching ...

Is Aggregation the Only Choice? Federated Learning via Layer-wise Model Recombination

Ming Hu, Zhihao Yue, Xiaofei Xie, Cheng Chen, Yihao Huang, Xian Wei, Xiang Lian, Yang Liu, Mingsong Chen

TL;DR

This work addresses the challenge of weight divergence in non-IID federated learning and proposes FedMR, a novel paradigm that replaces FedAvg aggregation with layer-wise model recombination to steer local training toward flat minima and improve generalization. It introduces a two-stage training scheme combining coarse aggregation-based pre-training with fine-grained recombination, along with a convergence analysis showing rates comparable to FedAvg. Empirical results across CIFAR-10/100 and FEMNIST on multiple models demonstrate that FedMR achieves higher accuracy and smoother convergence than seven state-of-the-art baselines, with robust performance under varying client participation and data heterogeneity. The approach also discusses privacy-preserving mechanisms and acknowledges limitations and future research directions in personalization and fairness.

Abstract

Although Federated Learning (FL) enables global model training across clients without compromising their raw data, due to the unevenly distributed data among clients, existing Federated Averaging (FedAvg)-based methods suffer from the problem of low inference performance. Specifically, different data distributions among clients lead to various optimization directions of local models. Aggregating local models usually results in a low-generalized global model, which performs worse on most of the clients. To address the above issue, inspired by the observation from a geometric perspective that a well-generalized solution is located in a flat area rather than a sharp area, we propose a novel and heuristic FL paradigm named FedMR (Federated Model Recombination). The goal of FedMR is to guide the recombined models to be trained towards a flat area. Unlike conventional FedAvg-based methods, in FedMR, the cloud server recombines collected local models by shuffling each layer of them to generate multiple recombined models for local training on clients rather than an aggregated global model. Since the area of the flat area is larger than the sharp area, when local models are located in different areas, recombined models have a higher probability of locating in a flat area. When all recombined models are located in the same flat area, they are optimized towards the same direction. We theoretically analyze the convergence of model recombination. Experimental results show that, compared with state-of-the-art FL methods, FedMR can significantly improve the inference accuracy without exposing the privacy of each client.

Is Aggregation the Only Choice? Federated Learning via Layer-wise Model Recombination

TL;DR

This work addresses the challenge of weight divergence in non-IID federated learning and proposes FedMR, a novel paradigm that replaces FedAvg aggregation with layer-wise model recombination to steer local training toward flat minima and improve generalization. It introduces a two-stage training scheme combining coarse aggregation-based pre-training with fine-grained recombination, along with a convergence analysis showing rates comparable to FedAvg. Empirical results across CIFAR-10/100 and FEMNIST on multiple models demonstrate that FedMR achieves higher accuracy and smoother convergence than seven state-of-the-art baselines, with robust performance under varying client participation and data heterogeneity. The approach also discusses privacy-preserving mechanisms and acknowledges limitations and future research directions in personalization and fairness.

Abstract

Although Federated Learning (FL) enables global model training across clients without compromising their raw data, due to the unevenly distributed data among clients, existing Federated Averaging (FedAvg)-based methods suffer from the problem of low inference performance. Specifically, different data distributions among clients lead to various optimization directions of local models. Aggregating local models usually results in a low-generalized global model, which performs worse on most of the clients. To address the above issue, inspired by the observation from a geometric perspective that a well-generalized solution is located in a flat area rather than a sharp area, we propose a novel and heuristic FL paradigm named FedMR (Federated Model Recombination). The goal of FedMR is to guide the recombined models to be trained towards a flat area. Unlike conventional FedAvg-based methods, in FedMR, the cloud server recombines collected local models by shuffling each layer of them to generate multiple recombined models for local training on clients rather than an aggregated global model. Since the area of the flat area is larger than the sharp area, when local models are located in different areas, recombined models have a higher probability of locating in a flat area. When all recombined models are located in the same flat area, they are optimized towards the same direction. We theoretically analyze the convergence of model recombination. Experimental results show that, compared with state-of-the-art FL methods, FedMR can significantly improve the inference accuracy without exposing the privacy of each client.
Paper Structure (32 sections, 3 theorems, 11 equations, 13 figures, 1 table, 1 algorithm)

This paper contains 32 sections, 3 theorems, 11 equations, 13 figures, 1 table, 1 algorithm.

Key Result

Lemma 4.4

Assume that in FedMR there are $K$ clients participating in every FL training round. Let $\{v^1_r,v^2_r,..,v^K_r\}$ and $\{w^1_r,w^2_r,..,w^K_r\}$ be the set of trained local model weights and the set of recombined model weights generated in the $(r-1)^{th}$ round, respectively. Assume $x$ is a vect

Figures (13)

  • Figure 1: Training processes on the same loss landscape.
  • Figure 2: An example of model recombination.
  • Figure 3: Our FedMR approach
  • Figure 4: Example of model aggregation and recombination
  • Figure 5: FedAvg vs. Indep.
  • ...and 8 more figures

Theorems & Definitions (3)

  • Lemma 4.4
  • Lemma A.1
  • Lemma A.2