UNIDEAL: Curriculum Knowledge Distillation Federated Learning

Yuwen Yang; Chang Liu; Xun Cai; Suizhi Huang; Hongtao Lu; Yue Ding

UNIDEAL: Curriculum Knowledge Distillation Federated Learning

Yuwen Yang, Chang Liu, Xun Cai, Suizhi Huang, Hongtao Lu, Yue Ding

TL;DR

UNIDEAL tackles cross-domain Federated Learning with heterogeneous model architectures by decoupling parameters and sharing only task head parameters, enabling flexible per-client feature extractors. It introduces Adjustable Teacher-Student Mutual Evaluation Curriculum Learning (CLKD), which uses batch-wise mutual evaluation scores and a cosine-based similarity metric to progressively supervise local heads with a global teacher during knowledge distillation, while linearly decaying the training subset from easy to hard samples. Empirical results across image and tabular cross-domain tasks show that UNIDEAL consistently surpasses state-of-the-art baselines in accuracy and communication efficiency, with CLKD based on cosine similarity providing the strongest gains. The paper also extends the approach to heterogeneous architectures (UNIDEAL-HETE) and proves a non-convex convergence rate of $O(\frac{1}{T})$, highlighting practical impact for scalable, privacy-preserving collaborative learning.

Abstract

Federated Learning (FL) has emerged as a promising approach to enable collaborative learning among multiple clients while preserving data privacy. However, cross-domain FL tasks, where clients possess data from different domains or distributions, remain a challenging problem due to the inherent heterogeneity. In this paper, we present UNIDEAL, a novel FL algorithm specifically designed to tackle the challenges of cross-domain scenarios and heterogeneous model architectures. The proposed method introduces Adjustable Teacher-Student Mutual Evaluation Curriculum Learning, which significantly enhances the effectiveness of knowledge distillation in FL settings. We conduct extensive experiments on various datasets, comparing UNIDEAL with state-of-the-art baselines. Our results demonstrate that UNIDEAL achieves superior performance in terms of both model accuracy and communication efficiency. Additionally, we provide a convergence analysis of the algorithm, showing a convergence rate of O(1/T) under non-convex conditions.

UNIDEAL: Curriculum Knowledge Distillation Federated Learning

TL;DR

, highlighting practical impact for scalable, privacy-preserving collaborative learning.

Abstract

Paper Structure (10 sections, 2 theorems, 8 equations, 1 figure, 2 tables)

This paper contains 10 sections, 2 theorems, 8 equations, 1 figure, 2 tables.

Introduction
METHODOLOGY
Sharing Only Task Head Parameters
Adjustable Teacher-Student Mutual Evaluation Curriculum Learning
Extension for Heterogeneous Architecture Models
Experiments
Experimental Setup
Main Results
Convergence Analysis
Conclusion

Key Result

Lemma 1

Define $\mathcal{\tilde{L}}$ as follows: where the equation is from the definitions of loss function eq:final_opt and inequality is because the indicator function in eq:CLloss is less than or equal to 1 and Assumption 1.3 in FedGKD FedGKD. Notice that for any approximate solution $\mathbf{w}_k^{t+1}$ satisfies $\tilde{\mathcal{L}}(\mathbf{

Figures (1)

Figure 1: Test accuracy varies with communication rounds in the DIGIT-NIID-1 setting. UNIDEAL achieves better accuracy improvement with fewer rounds, and reaches higher accuracy than other baselines in the later stage while maintaining stable accuracy.

Theorems & Definitions (2)

Lemma 1
Theorem 1: Convergence

UNIDEAL: Curriculum Knowledge Distillation Federated Learning

TL;DR

Abstract

UNIDEAL: Curriculum Knowledge Distillation Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (2)