Privacy-Preserving Federated Learning with Consistency via Knowledge Distillation Using Conditional Generator

Kangyang Luo; Shuai Wang; Xiang Li; Yunshi Lan; Ming Gao; Jinlong Shu

Privacy-Preserving Federated Learning with Consistency via Knowledge Distillation Using Conditional Generator

Kangyang Luo, Shuai Wang, Xiang Li, Yunshi Lan, Ming Gao, Jinlong Shu

TL;DR

This work tackles privacy leakage in Federated Learning by replacing per-client feature extractors with conditional generators and enforcing consistency through knowledge distillation at both latent-feature and logit levels. It introduces a two-stage client-side distillation and a crossed server-side distillation scheme, augmented with diversity constraints to prevent mode collapse, all without training extra discriminators. Empirical results on EMNIST, FMNIST, and CIFAR-10 show FedMD-CG delivering competitive accuracy while providing stronger privacy protection than full-sharing baselines, and ablations confirm the effectiveness of the proposed distillation and aggregation strategies. The proposed approach offers a practical pathway to high-performance, privacy-conscious FL in heterogeneous data settings, with clear directions for efficiency improvements and scaling.

Abstract

Federated Learning (FL) is gaining popularity as a distributed learning framework that only shares model parameters or gradient updates and keeps private data locally. However, FL is at risk of privacy leakage caused by privacy inference attacks. And most existing privacy-preserving mechanisms in FL conflict with achieving high performance and efficiency. Therefore, we propose FedMD-CG, a novel FL method with highly competitive performance and high-level privacy preservation, which decouples each client's local model into a feature extractor and a classifier, and utilizes a conditional generator instead of the feature extractor to perform server-side model aggregation. To ensure the consistency of local generators and classifiers, FedMD-CG leverages knowledge distillation to train local models and generators at both the latent feature level and the logit level. Also, we construct additional classification losses and design new diversity losses to enhance client-side training. FedMD-CG is robust to data heterogeneity and does not require training extra discriminators (like cGAN). We conduct extensive experiments on various image classification tasks to validate the superiority of FedMD-CG.

Privacy-Preserving Federated Learning with Consistency via Knowledge Distillation Using Conditional Generator

TL;DR

Abstract

Paper Structure (19 sections, 12 equations, 15 figures, 17 tables, 1 algorithm)

This paper contains 19 sections, 12 equations, 15 figures, 17 tables, 1 algorithm.

Introduction
Proposed Method
Client-side Two-stage Distillation
Server-side Crossed Distillation Aggregation
Discussion
Experiments
Implementation Settings
Results Comparison
Ablation Study
Conclusions
Related Work
Pseudocode
Algorithm Description
Computing devices and platforms
Full Experiments
...and 4 more sections

Figures (15)

Figure 1: Illustration of FedMD-CG: (a) The local model update distills the experience from the global generator $G$ for augmenting the generalization performance of the local model [$F_i$, $D_i$]. (b) The local generator update utilizes the trained local model [$F_i$, $D_i$] to guide the local generator $G_i$ to mimic latent feature space. Note that $G$ is not involved in client-side training. (c) The server-side data-free KD aggregation takes a crossed manner to achieve as much knowledge transfer as possible. Best viewed in color. Zoom in for details.
Figure 2: Visualization for output of the generator: The toy example first trains a LeNet LeCun1998Gradient as teacher model (T) using the training set of MNIST LeCun1998Gradient. Then the test set of MNIST is fed to T to get the latent features. And the dimensions of the latent features are reduced by principal component analysis (PCA) Halko2011Finding. (a) shows the latent features distribution of T after PCA dimension reduction. Next, we let T guide the training of the generator according to Eq. (\ref{['local_t_gen:']}). Similarly, we utilize PCA to perform dimension reduction for the output of the generator. (b) visualizes the output distribution of the generator without diversity constraint. (c), (d) and (e) visualize the output distribution of the generator with $\mathcal{L}_{div}^0$, $\mathcal{L}_{div}^1$ and $\mathcal{L}_{div}^2$, respectively.
Figure 3: (a)-(c) are learning curves selected from FedMD-CG as well as baselines over different datasets. (d)-(f) show test performance (%) w.r.t data hetergeneity over each dataset.
Figure 4: Image reconstruction with DLG attack in FedMD-CG and baselines. From the first to the last row, the images are selected from EMNIST, FMNIST and CIFAR-10 respectively. PSNR (dB) is reported under each recovered image.
Figure 5: The consistency comparison between local generators and classifiers for FedCG and FedMD-CG w.r.t. AVE_agg$^\star$. G+D loss denotes the classification loss of the local classifier on the output of the local generator.
...and 10 more figures

Privacy-Preserving Federated Learning with Consistency via Knowledge Distillation Using Conditional Generator

TL;DR

Abstract

Privacy-Preserving Federated Learning with Consistency via Knowledge Distillation Using Conditional Generator

Authors

TL;DR

Abstract

Table of Contents

Figures (15)