FedAL: Black-Box Federated Knowledge Distillation Enabled by Adversarial Learning

Pengchao Han; Xingyan Shi; Jianwei Huang

FedAL: Black-Box Federated Knowledge Distillation Enabled by Adversarial Learning

Pengchao Han, Xingyan Shi, Jianwei Huang

TL;DR

FedAL addresses knowledge transfer among multiple clients with heterogeneous, private data and black-box models by integrating an adversarial server discriminator with less-forgetting regularization. The min-max game aligns client outputs on an unlabeled public dataset, while LF regularization preserves cross-client knowledge during local and global updates. The framework yields theoretical generalization and convergence guarantees and demonstrates superior accuracy and reduced communication overhead over existing federated KD baselines across diverse datasets and heterogeneity levels. This approach enables efficient, architecture-agnostic collaboration in federated settings where raw data and model architectures are not shared, with practical impact for privacy-preserving collaborative learning.

Abstract

Knowledge distillation (KD) can enable collaborative learning among distributed clients that have different model architectures and do not share their local data and model parameters with others. Each client updates its local model using the average model output/feature of all client models as the target, known as federated KD. However, existing federated KD methods often do not perform well when clients' local models are trained with heterogeneous local datasets. In this paper, we propose Federated knowledge distillation enabled by Adversarial Learning (FedAL) to address the data heterogeneity among clients. First, to alleviate the local model output divergence across clients caused by data heterogeneity, the server acts as a discriminator to guide clients' local model training to achieve consensus model outputs among clients through a min-max game between clients and the discriminator. Moreover, catastrophic forgetting may happen during the clients' local training and global knowledge transfer due to clients' heterogeneous local data. Towards this challenge, we design the less-forgetting regularization for both local training and global knowledge transfer to guarantee clients' ability to transfer/learn knowledge to/from others. Experimental results show that FedAL and its variants achieve higher accuracy than other federated KD baselines.

FedAL: Black-Box Federated Knowledge Distillation Enabled by Adversarial Learning

TL;DR

Abstract

Paper Structure (26 sections, 13 theorems, 91 equations, 11 figures, 9 tables, 1 algorithm)

This paper contains 26 sections, 13 theorems, 91 equations, 11 figures, 9 tables, 1 algorithm.

Introduction
Related Work
Problem Formulation and Preliminaries
Problem Formulation
Federated KD
Proposed Algorithm: FedAL
FedAL Framework
Min-Max Game Formulation
Less-Forgetting Regularization
Objectives
FedAL Algorithm
Theoretical Analysis
Communication Overhead
Generalization Bound
Convergence Analysis
...and 11 more sections

Key Result

Lemma 4.2

Given fixed client models $\boldsymbol{\Theta}$, the discriminator's best response choice $\boldsymbol{w}^{*}(\boldsymbol{\Theta})$ that maximizes eq:min-max-obj for any input sample $\boldsymbol{x}$ satisfies for all $n \in \mathcal{N}$, where $\left[h\left(\cdot\right)\right]_n$ indicates the $n$th element of $h$.

Figures (11)

Figure 1: Framework of FedAL, where FedAL includes the components in the red dashed region that FedMD does not have.
Figure 2: An example of average model output probability distribution of the final models trained using FedMD and FedAL for data samples labeled by "3" of the SVHN dataset. The value of $\alpha$ captures the heterogeneity of the clients' data, and the entry values in the matrix capture the predicted probabilities of the input data for different classes. We distribute the whole SVHN dataset to 20 clients as their local datasets according to Dirichlet distribution with parameter $\alpha$. Smaller $\alpha$ indicates larger heterogeneity of local data across clients.
Figure 3: Model parameter updates of FedAL in each training round.
Figure 4: Data distributions of SVHN for different $\alpha$, Left to right: $\alpha= 5, 2,$, and 1.
Figure 5: Federated KD prototype system.
...and 6 more figures

Theorems & Definitions (23)

Lemma 4.2: Discriminator's best response
Lemma 4.3: Client's best response
Theorem 4.4: Equilibrium
proof
Lemma 5.2
Theorem 5.3: Generalization bound
Theorem 5.8: Convergence error of FedAL
Proposition 8.1
Theorem 8.2: Uniform Convergence Understanding
Theorem 8.3: Domain adaptation ben2010theory
...and 13 more

FedAL: Black-Box Federated Knowledge Distillation Enabled by Adversarial Learning

TL;DR

Abstract

FedAL: Black-Box Federated Knowledge Distillation Enabled by Adversarial Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (23)