Table of Contents
Fetching ...

FedMerge: Federated Personalization via Model Merging

Shutong Chen, Tianyi Zhou, Guodong Long, Jing Jiang, Chengqi Zhang

TL;DR

FedMerge tackles non-IID federated learning by maintaining a server-side model soup of $d$ global models and forming per-client personalized models via weighted merging $\theta_i = \sum_j w_{(i,j)} \Theta_j$. The server optimizes both the global models and the per-client merging weights, while the merging operation is performed server-side and only the merged models are communicated. A backpropagation-like gradient flow uses the client losses through the merged representations to update $\Theta_j$ and $w_{(i,j)}$, with a row-softmax constraint to normalize weights. Across CIFAR-100, PACS, and foundation-model FL settings, FedMerge consistently outperforms single-model baselines and many multi-model baselines, especially as the model soup grows, while maintaining client-resource efficiency and reducing the local-global gap.

Abstract

One global model in federated learning (FL) might not be sufficient to serve many clients with non-IID tasks and distributions. While there has been advances in FL to train multiple global models for better personalization, they only provide limited choices to clients so local finetuning is still indispensable. In this paper, we propose a novel ``FedMerge'' approach that can create a personalized model per client by simply merging multiple global models with automatically optimized and customized weights. In FedMerge, a few global models can serve many non-IID clients, even without further local finetuning. We formulate this problem as a joint optimization of global models and the merging weights for each client. Unlike existing FL approaches where the server broadcasts one or multiple global models to all clients, the server only needs to send a customized, merged model to each client. Moreover, instead of periodically interrupting the local training and re-initializing it to a global model, the merged model aligns better with each client's task and data distribution, smoothening the local-global gap between consecutive rounds caused by client drift. We evaluate FedMerge on three different non-IID settings applied to different domains with diverse tasks and data types, in which FedMerge consistently outperforms existing FL approaches, including clustering-based and mixture-of-experts (MoE) based methods.

FedMerge: Federated Personalization via Model Merging

TL;DR

FedMerge tackles non-IID federated learning by maintaining a server-side model soup of global models and forming per-client personalized models via weighted merging . The server optimizes both the global models and the per-client merging weights, while the merging operation is performed server-side and only the merged models are communicated. A backpropagation-like gradient flow uses the client losses through the merged representations to update and , with a row-softmax constraint to normalize weights. Across CIFAR-100, PACS, and foundation-model FL settings, FedMerge consistently outperforms single-model baselines and many multi-model baselines, especially as the model soup grows, while maintaining client-resource efficiency and reducing the local-global gap.

Abstract

One global model in federated learning (FL) might not be sufficient to serve many clients with non-IID tasks and distributions. While there has been advances in FL to train multiple global models for better personalization, they only provide limited choices to clients so local finetuning is still indispensable. In this paper, we propose a novel ``FedMerge'' approach that can create a personalized model per client by simply merging multiple global models with automatically optimized and customized weights. In FedMerge, a few global models can serve many non-IID clients, even without further local finetuning. We formulate this problem as a joint optimization of global models and the merging weights for each client. Unlike existing FL approaches where the server broadcasts one or multiple global models to all clients, the server only needs to send a customized, merged model to each client. Moreover, instead of periodically interrupting the local training and re-initializing it to a global model, the merged model aligns better with each client's task and data distribution, smoothening the local-global gap between consecutive rounds caused by client drift. We evaluate FedMerge on three different non-IID settings applied to different domains with diverse tasks and data types, in which FedMerge consistently outperforms existing FL approaches, including clustering-based and mixture-of-experts (MoE) based methods.

Paper Structure

This paper contains 26 sections, 28 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: Comparison between MoE-like FL methods and the proposed FedMerge. Both method perform weight averaging over global models. MoE-like FL perform weight averaging on each client, while FedMerge perform weight averaging on the server.
  • Figure 2: The update information flow of global models and merging weights for FedMerge. The update information for each global model is derived from the updates of all clients. The update information for each merging weight comes from the global model and merged model it connects.
  • Figure 3: Server Parameter Usage vs. Accuracy. The x-axis represents multiples of ResNet-9's parameter count. In FedMerge, we fix the architecture to ResNet-9 and vary the number of global models from [5, 10, 15, 20, 25, 30].
  • Figure 4: Client Parameter Usage vs. Accuracy. The x-axis represents multiples of ResNet-9's parameter count. In FedMerge, we fix the number of global models at 15, all using the same architecture, selected from [ResNet-9, ResNet-18, ResNet-34, ResNet-50, and ResNet-101].
  • Figure 5: Visualization of merge weights with 10 global models and 50 clients. Each row denotes one normalized weight vector for each client. In Cluster Non-IID, we separate ground-truth clusters with white dashed lines.
  • ...and 2 more figures