Federated Model Heterogeneous Matryoshka Representation Learning

Liping Yi; Han Yu; Chao Ren; Gang Wang; Xiaoguang Liu; Xiaoxiao Li

Federated Model Heterogeneous Matryoshka Representation Learning

Liping Yi, Han Yu, Chao Ren, Gang Wang, Xiaoguang Liu, Xiaoxiao Li

TL;DR

This work addresses heterogeneity in federated learning by introducing FedMRL, which couples heterogeneous client models with a shared global homogeneous small model to enable richer cross-model knowledge exchange. It introduces Adaptive Representation Fusion to create a personalized fused representation and Multi-Granularity Representation Learning to build Matryoshka representations processed by both global and local headers. Theoretical analysis establishes a non-convex convergence rate of $\\mathcal{O}(1/T)$, and extensive experiments on CIFAR-10/100 demonstrate notable accuracy gains (up to $8.48\%$ vs the best baseline and $24.94\%$ vs the best same-category baseline) with lower communication and competitive computation costs. The approach preserves privacy by keeping heterogeneous local models on-device while exchanging only the global small model, making FedMRL practical for real-world, resource-constrained FL deployments. Overall, FedMRL advances personalized, communication-efficient, and privacy-preserving MHeteroFL with robust performance under non-IID data.

Abstract

Model heterogeneous federated learning (MHeteroFL) enables FL clients to collaboratively train models with heterogeneous structures in a distributed fashion. However, existing MHeteroFL methods rely on training loss to transfer knowledge between the client model and the server model, resulting in limited knowledge exchange. To address this limitation, we propose the Federated model heterogeneous Matryoshka Representation Learning (FedMRL) approach for supervised learning tasks. It adds an auxiliary small homogeneous model shared by clients with heterogeneous local models. (1) The generalized and personalized representations extracted by the two models' feature extractors are fused by a personalized lightweight representation projector. This step enables representation fusion to adapt to local data distribution. (2) The fused representation is then used to construct Matryoshka representations with multi-dimensional and multi-granular embedded representations learned by the global homogeneous model header and the local heterogeneous model header. This step facilitates multi-perspective representation learning and improves model learning capability. Theoretical analysis shows that FedMRL achieves a $O(1/T)$ non-convex convergence rate. Extensive experiments on benchmark datasets demonstrate its superior model accuracy with low communication and computational costs compared to seven state-of-the-art baselines. It achieves up to 8.48% and 24.94% accuracy improvement compared with the state-of-the-art and the best same-category baseline, respectively.

Federated Model Heterogeneous Matryoshka Representation Learning

TL;DR

, and extensive experiments on CIFAR-10/100 demonstrate notable accuracy gains (up to

vs the best baseline and

vs the best same-category baseline) with lower communication and competitive computation costs. The approach preserves privacy by keeping heterogeneous local models on-device while exchanging only the global small model, making FedMRL practical for real-world, resource-constrained FL deployments. Overall, FedMRL advances personalized, communication-efficient, and privacy-preserving MHeteroFL with robust performance under non-IID data.

Abstract

non-convex convergence rate. Extensive experiments on benchmark datasets demonstrate its superior model accuracy with low communication and computational costs compared to seven state-of-the-art baselines. It achieves up to 8.48% and 24.94% accuracy improvement compared with the state-of-the-art and the best same-category baseline, respectively.

Paper Structure (31 sections, 4 theorems, 32 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 31 sections, 4 theorems, 32 equations, 7 figures, 3 tables, 1 algorithm.

Introduction
Related Work
The Proposed FedMRL Approach
Adaptive Representation Fusion
Multi-Granular Representation Learning
Convergence Analysis
Experimental Evaluation
Experiment Setup
Results and Discussion
Average Test Accuracy
Individual Client Test Accuracy
Communication Cost
Computation Overhead
Case Studies
Robustness to Non-IIDness (Class)
...and 16 more sections

Key Result

Lemma 1

Local Training. Given Assumptions assump:Lipschitz and assump:Unbiased, the loss of an arbitrary client's local model $w$ in local training round $(t+1)$ is bounded by:

Figures (7)

Figure 1: Left: Matryoshka Representation Learning. Right: Feature extractor and prediction header.
Figure 2: The workflow of FedMRL.
Figure 3: Left six: average test accuracy vs. communication rounds. Right two: individual clients' test accuracy (%) differences (FedMRL - FedProto).
Figure 4: Communication rounds, number of communicated parameters, and computation FLOPs required to reach $90\%$ and $50\%$ average test accuracy targets on CIFAR-10 and CIFAR-100.
Figure 5: Robustness to non-IIDness (Class & Dirichlet).
...and 2 more figures

Theorems & Definitions (8)

Lemma 1
Lemma 2
Theorem 1
Theorem 2
Proof 1
Proof 2
Proof 3
Proof 4

Federated Model Heterogeneous Matryoshka Representation Learning

TL;DR

Abstract

Federated Model Heterogeneous Matryoshka Representation Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (8)