Table of Contents
Fetching ...

FedL2T: Personalized Federated Learning with Two-Teacher Distillation for Seizure Prediction

Jionghao Lou, Jian Zhang, Zhongmei Li, Lanlan Chen, Enbo Feng

TL;DR

FedL2T tackles seizure prediction with privacy-preserving personalized federated learning by introducing a two-teacher distillation framework that combines a global transfer model with dynamically assigned peer teachers. The method employs adaptive multi-level distillation, including cross-client and mutual learning, along with a proximal regularization term to stabilize training under non-IID EEG data. Empirical results on CHB-MIT and Renji show superior accuracy, faster convergence, and robustness to limited labels compared with state-of-the-art FL methods. The approach demonstrates strong potential for real-world clinical deployment where data privacy and heterogeneity are critical considerations.

Abstract

The training of deep learning models in seizure prediction requires large amounts of Electroencephalogram (EEG) data. However, acquiring sufficient labeled EEG data is difficult due to annotation costs and privacy constraints. Federated Learning (FL) enables privacy-preserving collaborative training by sharing model updates instead of raw data. However, due to the inherent inter-patient variability in real-world scenarios, existing FL-based seizure prediction methods struggle to achieve robust performance under heterogeneous client settings. To address this challenge, we propose FedL2T, a personalized federated learning framework that leverages a novel two-teacher knowledge distillation strategy to generate superior personalized models for each client. Specifically, each client simultaneously learns from a globally aggregated model and a dynamically assigned peer model, promoting more direct and enriched knowledge exchange. To ensure reliable knowledge transfer, FedL2T employs an adaptive multi-level distillation strategy that aligns both prediction outputs and intermediate feature representations based on task confidence. In addition, a proximal regularization term is introduced to constrain personalized model updates, thereby enhancing training stability. Extensive experiments on two EEG datasets demonstrate that FedL2T consistently outperforms state-of-the-art FL methods, particularly under low-label conditions. Moreover, FedL2T exhibits rapid and stable convergence toward optimal performance, thereby reducing the number of communication rounds and associated overhead. These results underscore the potential of FedL2T as a reliable and personalized solution for seizure prediction in privacy-sensitive healthcare scenarios.

FedL2T: Personalized Federated Learning with Two-Teacher Distillation for Seizure Prediction

TL;DR

FedL2T tackles seizure prediction with privacy-preserving personalized federated learning by introducing a two-teacher distillation framework that combines a global transfer model with dynamically assigned peer teachers. The method employs adaptive multi-level distillation, including cross-client and mutual learning, along with a proximal regularization term to stabilize training under non-IID EEG data. Empirical results on CHB-MIT and Renji show superior accuracy, faster convergence, and robustness to limited labels compared with state-of-the-art FL methods. The approach demonstrates strong potential for real-world clinical deployment where data privacy and heterogeneity are critical considerations.

Abstract

The training of deep learning models in seizure prediction requires large amounts of Electroencephalogram (EEG) data. However, acquiring sufficient labeled EEG data is difficult due to annotation costs and privacy constraints. Federated Learning (FL) enables privacy-preserving collaborative training by sharing model updates instead of raw data. However, due to the inherent inter-patient variability in real-world scenarios, existing FL-based seizure prediction methods struggle to achieve robust performance under heterogeneous client settings. To address this challenge, we propose FedL2T, a personalized federated learning framework that leverages a novel two-teacher knowledge distillation strategy to generate superior personalized models for each client. Specifically, each client simultaneously learns from a globally aggregated model and a dynamically assigned peer model, promoting more direct and enriched knowledge exchange. To ensure reliable knowledge transfer, FedL2T employs an adaptive multi-level distillation strategy that aligns both prediction outputs and intermediate feature representations based on task confidence. In addition, a proximal regularization term is introduced to constrain personalized model updates, thereby enhancing training stability. Extensive experiments on two EEG datasets demonstrate that FedL2T consistently outperforms state-of-the-art FL methods, particularly under low-label conditions. Moreover, FedL2T exhibits rapid and stable convergence toward optimal performance, thereby reducing the number of communication rounds and associated overhead. These results underscore the potential of FedL2T as a reliable and personalized solution for seizure prediction in privacy-sensitive healthcare scenarios.

Paper Structure

This paper contains 21 sections, 9 equations, 3 figures, 6 tables, 1 algorithm.

Figures (3)

  • Figure 1: Comparison of traditional FL and the proposed FedL2T framework. (a) Traditional FL relies on a central server for model aggregation, limiting personalization under heterogeneous data. (b) FedL2T introduces an additional peer teacher for distillation, enabling both global and cross-client knowledge transfer to promote learning diversity.
  • Figure 2: Overview of the proposed FedL2T framework. Each client maintains a personalized model $\boldsymbol{P}_k$ and a transfer model $\boldsymbol{T}_k$. During local training, FedL2T performs two complementary knowledge distillation processes: Adaptive Mutual Learning (AML) between $\boldsymbol{P}_k$ and $\boldsymbol{T}_k$, and Adaptive Cross Learning (ACL) from a peer model $\boldsymbol{P}_c$ assigned via a cross-client queue $\mathcal{Q}^r$. Both soft predictions and intermediate features are leveraged for multi-level distillation. The transfer models $\boldsymbol{T}_k$ are periodically synchronized with the global model $\boldsymbol{T}^r_G$ aggregated on the server, enabling global coordination without sharing raw data. Colored arrows indicate the direction of knowledge transfer. Flame and snowflake icons respectively indicate updated and frozen parameters, with the latter excluded from gradient updates during training.
  • Figure 3: Comparison of classification accuracy (%) across communication rounds on the CHB-MIT (left) and Renji (right) datasets. Each curve represents the average test accuracy of a federated method across clients.