Table of Contents
Fetching ...

Federated Mutual Learning

Tao Shen, Jie Zhang, Xinkang Jia, Fengda Zhang, Gang Huang, Pan Zhou, Kun Kuang, Fei Wu, Chao Wu

TL;DR

This work identifies data, objective, and model heterogeneity (DOM) as central challenges in federated learning and proposes Federated Mutual Learning (FML) to address them. FML equips each client with a meme (global fork) model and a personalized local model, and uses deep mutual learning to exchange knowledge via KL-based losses during local updates; meme models are aggregated to progressively refine the global model while clients retain personalized components. Empirical results on MNIST and CIFAR datasets show that FML outperforms FedAvg and FedProx under both IID and Non-IID conditions and remains effective under DOM scenarios involving data, model, and task heterogeneity. The approach preserves privacy by keeping personalized models local and demonstrates robustness to heterogeneity while revealing phenomena such as catfish effects and the benefits of dynamic balancing of learning signals.

Abstract

Federated learning (FL) enables collaboratively training deep learning models on decentralized data. However, there are three types of heterogeneities in FL setting bringing about distinctive challenges to the canonical federated learning algorithm (FedAvg). First, due to the Non-IIDness of data, the global shared model may perform worse than local models that solely trained on their private data; Second, the objective of center server and clients may be different, where center server seeks for a generalized model whereas client pursue a personalized model, and clients may run different tasks; Third, clients may need to design their customized model for various scenes and tasks; In this work, we present a novel federated learning paradigm, named Federated Mutual Leaning (FML), dealing with the three heterogeneities. FML allows clients training a generalized model collaboratively and a personalized model independently, and designing their private customized models. Thus, the Non-IIDness of data is no longer a bug but a feature that clients can be personally served better. The experiments show that FML can achieve better performance than alternatives in typical FL setting, and clients can be benefited from FML with different models and tasks.

Federated Mutual Learning

TL;DR

This work identifies data, objective, and model heterogeneity (DOM) as central challenges in federated learning and proposes Federated Mutual Learning (FML) to address them. FML equips each client with a meme (global fork) model and a personalized local model, and uses deep mutual learning to exchange knowledge via KL-based losses during local updates; meme models are aggregated to progressively refine the global model while clients retain personalized components. Empirical results on MNIST and CIFAR datasets show that FML outperforms FedAvg and FedProx under both IID and Non-IID conditions and remains effective under DOM scenarios involving data, model, and task heterogeneity. The approach preserves privacy by keeping personalized models local and demonstrates robustness to heterogeneity while revealing phenomena such as catfish effects and the benefits of dynamic balancing of learning signals.

Abstract

Federated learning (FL) enables collaboratively training deep learning models on decentralized data. However, there are three types of heterogeneities in FL setting bringing about distinctive challenges to the canonical federated learning algorithm (FedAvg). First, due to the Non-IIDness of data, the global shared model may perform worse than local models that solely trained on their private data; Second, the objective of center server and clients may be different, where center server seeks for a generalized model whereas client pursue a personalized model, and clients may run different tasks; Third, clients may need to design their customized model for various scenes and tasks; In this work, we present a novel federated learning paradigm, named Federated Mutual Leaning (FML), dealing with the three heterogeneities. FML allows clients training a generalized model collaboratively and a personalized model independently, and designing their private customized models. Thus, the Non-IIDness of data is no longer a bug but a feature that clients can be personally served better. The experiments show that FML can achieve better performance than alternatives in typical FL setting, and clients can be benefited from FML with different models and tasks.

Paper Structure

This paper contains 27 sections, 5 equations, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1: The Non-IIDness of data is usually mentioned in typical federated learning, which refers to the data heterogeneity (DH). Besides, server and clients in FL used to train a single model with same architecture on a single task. However, the objective of server may be different with that of clients and clients may run different tasks (OH). Thus, clients may want to train different models (MH).
  • Figure 2: a) Each client in FML trains two models over private data during local update: the meme model and the personalized model; b) At each communication round, clients fork the new generation of global model as its meme model but the personalized model is trained privately and continuously; c) During each local update, the two models in clients conduct DML for several epochs, learning mutually.
  • Figure 3: FML results in better improvements than FedAvg and FedProx in four data settings. We simulate different levels of data heterogeneity (it becomes more difficult from left to right). Due to the DML, the $D_{KL}$ loss item is a strong regularizer for training. In the Non-IID setting, FML performs better with a steady trajectory, than FedProx and FedAvg that with severe oscillation. According to zhang2018deep, FML can find a more steady (robust) minimum.
  • Figure 5: The solid curve is the accuracy of personalized model training with FML, and the dash line is the best accuracy of personalized model with independently training, both over private validate set. The results shows that FML can benefit all clients with different models by a shared model.
  • Figure 6: The green and red curves refer to LeNet5 and CNN1 trained independently over CIFAR10 and CIFAR100, respectively. The blue and orange curves refer to the two models trained by FML. The results shows that FML can benefit all clients with different tasks by a shared representation.