FedMerge: Federated Personalization via Model Merging
Shutong Chen, Tianyi Zhou, Guodong Long, Jing Jiang, Chengqi Zhang
TL;DR
FedMerge tackles non-IID federated learning by maintaining a server-side model soup of $d$ global models and forming per-client personalized models via weighted merging $\theta_i = \sum_j w_{(i,j)} \Theta_j$. The server optimizes both the global models and the per-client merging weights, while the merging operation is performed server-side and only the merged models are communicated. A backpropagation-like gradient flow uses the client losses through the merged representations to update $\Theta_j$ and $w_{(i,j)}$, with a row-softmax constraint to normalize weights. Across CIFAR-100, PACS, and foundation-model FL settings, FedMerge consistently outperforms single-model baselines and many multi-model baselines, especially as the model soup grows, while maintaining client-resource efficiency and reducing the local-global gap.
Abstract
One global model in federated learning (FL) might not be sufficient to serve many clients with non-IID tasks and distributions. While there has been advances in FL to train multiple global models for better personalization, they only provide limited choices to clients so local finetuning is still indispensable. In this paper, we propose a novel ``FedMerge'' approach that can create a personalized model per client by simply merging multiple global models with automatically optimized and customized weights. In FedMerge, a few global models can serve many non-IID clients, even without further local finetuning. We formulate this problem as a joint optimization of global models and the merging weights for each client. Unlike existing FL approaches where the server broadcasts one or multiple global models to all clients, the server only needs to send a customized, merged model to each client. Moreover, instead of periodically interrupting the local training and re-initializing it to a global model, the merged model aligns better with each client's task and data distribution, smoothening the local-global gap between consecutive rounds caused by client drift. We evaluate FedMerge on three different non-IID settings applied to different domains with diverse tasks and data types, in which FedMerge consistently outperforms existing FL approaches, including clustering-based and mixture-of-experts (MoE) based methods.
