Table of Contents
Fetching ...

UNIDEAL: Curriculum Knowledge Distillation Federated Learning

Yuwen Yang, Chang Liu, Xun Cai, Suizhi Huang, Hongtao Lu, Yue Ding

TL;DR

UNIDEAL tackles cross-domain Federated Learning with heterogeneous model architectures by decoupling parameters and sharing only task head parameters, enabling flexible per-client feature extractors. It introduces Adjustable Teacher-Student Mutual Evaluation Curriculum Learning (CLKD), which uses batch-wise mutual evaluation scores and a cosine-based similarity metric to progressively supervise local heads with a global teacher during knowledge distillation, while linearly decaying the training subset from easy to hard samples. Empirical results across image and tabular cross-domain tasks show that UNIDEAL consistently surpasses state-of-the-art baselines in accuracy and communication efficiency, with CLKD based on cosine similarity providing the strongest gains. The paper also extends the approach to heterogeneous architectures (UNIDEAL-HETE) and proves a non-convex convergence rate of $O(\frac{1}{T})$, highlighting practical impact for scalable, privacy-preserving collaborative learning.

Abstract

Federated Learning (FL) has emerged as a promising approach to enable collaborative learning among multiple clients while preserving data privacy. However, cross-domain FL tasks, where clients possess data from different domains or distributions, remain a challenging problem due to the inherent heterogeneity. In this paper, we present UNIDEAL, a novel FL algorithm specifically designed to tackle the challenges of cross-domain scenarios and heterogeneous model architectures. The proposed method introduces Adjustable Teacher-Student Mutual Evaluation Curriculum Learning, which significantly enhances the effectiveness of knowledge distillation in FL settings. We conduct extensive experiments on various datasets, comparing UNIDEAL with state-of-the-art baselines. Our results demonstrate that UNIDEAL achieves superior performance in terms of both model accuracy and communication efficiency. Additionally, we provide a convergence analysis of the algorithm, showing a convergence rate of O(1/T) under non-convex conditions.

UNIDEAL: Curriculum Knowledge Distillation Federated Learning

TL;DR

UNIDEAL tackles cross-domain Federated Learning with heterogeneous model architectures by decoupling parameters and sharing only task head parameters, enabling flexible per-client feature extractors. It introduces Adjustable Teacher-Student Mutual Evaluation Curriculum Learning (CLKD), which uses batch-wise mutual evaluation scores and a cosine-based similarity metric to progressively supervise local heads with a global teacher during knowledge distillation, while linearly decaying the training subset from easy to hard samples. Empirical results across image and tabular cross-domain tasks show that UNIDEAL consistently surpasses state-of-the-art baselines in accuracy and communication efficiency, with CLKD based on cosine similarity providing the strongest gains. The paper also extends the approach to heterogeneous architectures (UNIDEAL-HETE) and proves a non-convex convergence rate of , highlighting practical impact for scalable, privacy-preserving collaborative learning.

Abstract

Federated Learning (FL) has emerged as a promising approach to enable collaborative learning among multiple clients while preserving data privacy. However, cross-domain FL tasks, where clients possess data from different domains or distributions, remain a challenging problem due to the inherent heterogeneity. In this paper, we present UNIDEAL, a novel FL algorithm specifically designed to tackle the challenges of cross-domain scenarios and heterogeneous model architectures. The proposed method introduces Adjustable Teacher-Student Mutual Evaluation Curriculum Learning, which significantly enhances the effectiveness of knowledge distillation in FL settings. We conduct extensive experiments on various datasets, comparing UNIDEAL with state-of-the-art baselines. Our results demonstrate that UNIDEAL achieves superior performance in terms of both model accuracy and communication efficiency. Additionally, we provide a convergence analysis of the algorithm, showing a convergence rate of O(1/T) under non-convex conditions.
Paper Structure (10 sections, 2 theorems, 8 equations, 1 figure, 2 tables)

This paper contains 10 sections, 2 theorems, 8 equations, 1 figure, 2 tables.

Key Result

Lemma 1

Define $\mathcal{\tilde{L}}$ as follows: where the equation is from the definitions of loss function eq:final_opt and inequality is because the indicator function in eq:CLloss is less than or equal to 1 and Assumption 1.3 in FedGKD FedGKD. Notice that for any approximate solution $\mathbf{w}_k^{t+1}$ satisfies $\tilde{\mathcal{L}}(\mathbf{

Figures (1)

  • Figure 1: Test accuracy varies with communication rounds in the DIGIT-NIID-1 setting. UNIDEAL achieves better accuracy improvement with fewer rounds, and reaches higher accuracy than other baselines in the later stage while maintaining stable accuracy.

Theorems & Definitions (2)

  • Lemma 1
  • Theorem 1: Convergence