Table of Contents
Fetching ...

FedIN: Federated Intermediate Layers Learning for Model Heterogeneity

Yun-Hin Chan, Zhihan Jiang, Jing Deng, Edith C. -H. Ngai

TL;DR

FedIN tackles heterogeneous federated learning by partitioning client models into an extractor, intermediate layers, and classifier, and exchanging a single batch of features $s_{in}$ and $s_{out}$ to perform IN training on the intermediate layers. It couples local training with a convex gradient-alignment step to mitigate gradient divergence, enabling effective layer-wise aggregation across diverse architectures. Empirical results on CIFAR-10, Fashion-MNIST, and SVHN show FedIN achieves higher accuracy and faster convergence than seven baselines under both IID and non-IID data, with modest overhead. Ablation studies and visualizations (CKA, t-SNE) support the benefits of IN training and gradient alignment in aligning representations across heterogeneous clients.

Abstract

Federated learning (FL) facilitates edge devices to cooperatively train a global shared model while maintaining the training data locally and privately. However, a common assumption in FL requires the participating edge devices to have similar computation resources and train on an identical global model architecture. In this study, we propose an FL method called Federated Intermediate Layers Learning (FedIN), supporting heterogeneous models without relying on any public dataset. Instead, FedIN leverages the inherent knowledge embedded in client model features to facilitate knowledge exchange. The training models in FedIN are partitioned into three distinct components: an extractor, intermediate layers, and a classifier. We capture client features by extracting the outputs of the extractor and the inputs of the classifier. To harness the knowledge from client features, we propose IN training for aligning the intermediate layers based on features obtained from other clients. IN training only needs minimal memory and communication overhead by utilizing a single batch of client features. Additionally, we formulate and address a convex optimization problem to mitigate the challenge of gradient divergence caused by conflicts between IN training and local training. The experiment results demonstrate the superior performance of FedIN in heterogeneous model environments compared to state-of-the-art algorithms. Furthermore, our ablation study demonstrates the effectiveness of IN training and the proposed solution for alleviating gradient divergence.

FedIN: Federated Intermediate Layers Learning for Model Heterogeneity

TL;DR

FedIN tackles heterogeneous federated learning by partitioning client models into an extractor, intermediate layers, and classifier, and exchanging a single batch of features and to perform IN training on the intermediate layers. It couples local training with a convex gradient-alignment step to mitigate gradient divergence, enabling effective layer-wise aggregation across diverse architectures. Empirical results on CIFAR-10, Fashion-MNIST, and SVHN show FedIN achieves higher accuracy and faster convergence than seven baselines under both IID and non-IID data, with modest overhead. Ablation studies and visualizations (CKA, t-SNE) support the benefits of IN training and gradient alignment in aligning representations across heterogeneous clients.

Abstract

Federated learning (FL) facilitates edge devices to cooperatively train a global shared model while maintaining the training data locally and privately. However, a common assumption in FL requires the participating edge devices to have similar computation resources and train on an identical global model architecture. In this study, we propose an FL method called Federated Intermediate Layers Learning (FedIN), supporting heterogeneous models without relying on any public dataset. Instead, FedIN leverages the inherent knowledge embedded in client model features to facilitate knowledge exchange. The training models in FedIN are partitioned into three distinct components: an extractor, intermediate layers, and a classifier. We capture client features by extracting the outputs of the extractor and the inputs of the classifier. To harness the knowledge from client features, we propose IN training for aligning the intermediate layers based on features obtained from other clients. IN training only needs minimal memory and communication overhead by utilizing a single batch of client features. Additionally, we formulate and address a convex optimization problem to mitigate the challenge of gradient divergence caused by conflicts between IN training and local training. The experiment results demonstrate the superior performance of FedIN in heterogeneous model environments compared to state-of-the-art algorithms. Furthermore, our ablation study demonstrates the effectiveness of IN training and the proposed solution for alleviating gradient divergence.
Paper Structure (32 sections, 14 equations, 9 figures, 8 tables, 1 algorithm)

This paper contains 32 sections, 14 equations, 9 figures, 8 tables, 1 algorithm.

Figures (9)

  • Figure 1: An illustration for model heterogeneity. The clients participate in the federated learning with different available resources, inducing different model architectures.
  • Figure 2: Details of model architectures and the training process for FedIN. In this figure, blue arrows represent the transmission of corresponding client features, i.e., feature inputs and feature outputs, $(s_{in}, s_{out})$. The process for FedIN is described as follows. ① First, clients receive client features and global weights $\bar{w}$ from the server. ② After updating client weights by global weights, the clients are training their models from the local private dataset and completing the IN training for the client features inputs and outputs $(s_{in},s_{out})$ from the server. ③ Upon completing the local training, clients transmit the model weights and new client features, denoted as $(w_k, s_{in}, s_{out})$, to the server. The aggregation methods for system heterogeneity are discussed in \ref{['sec:aggregation']}.
  • Figure 3: Illustrations for IID data and non-IID data with $\alpha=0.5$.
  • Figure 4: The smoothed test accuracy on non-IID data of CIFAR-10. The original results of accuracy are the grey lines. The red dot line denotes the target accuracy in \ref{['tab:acc_CIFAR10']}.
  • Figure 5: Illustrations for CKA similarity of IID data and non-IID data with CIFAR-10.
  • ...and 4 more figures