Table of Contents
Fetching ...

Multi-Branch Mutual-Distillation Transformer for EEG-Based Seizure Subtype Classification

Ruimin Peng, Zhenbang Du, Changming Zhao, Jingwei Luo, Wenzhong Liu, Xinxing Chen, Dongrui Wu

TL;DR

This work tackles cross-subject EEG-based seizure subtype classification under data scarcity. It introduces the MBMD Transformer, which integrates multi-branch encoder blocks with a mutual-distillation objective that transfers knowledge between raw EEG data and six wavelet-based frequency bands obtained via Wavelet Packet Decomposition, using a shared loss with a bi-directional KL term. Key contributions include the design of six-branch expert FFNs with branch-wise attention, a mutual-distillation loss with temperature $T$, and a total objective $\mathcal{L}=\mathcal{L}_{ce}+\mathcal{L}_{distill}+\lambda\mathcal{L}_{norm}$ ($\lambda=0.01$). On CHSZ and TUSZ, MBMD outperforms traditional ML and other deep models, demonstrating improved cross-subject seizure subtype classification and highlighting the practical potential for epilepsy diagnostics with limited labeled data.

Abstract

Cross-subject electroencephalogram (EEG) based seizure subtype classification is very important in precise epilepsy diagnostics. Deep learning is a promising solution, due to its ability to automatically extract latent patterns. However, it usually requires a large amount of training data, which may not always be available in clinical practice. This paper proposes Multi-Branch Mutual-Distillation (MBMD) Transformer for cross-subject EEG-based seizure subtype classification, which can be effectively trained from small labeled data. MBMD Transformer replaces all even-numbered encoder blocks of the vanilla Vision Transformer by our designed multi-branch encoder blocks. A mutual-distillation strategy is proposed to transfer knowledge between the raw EEG data and its wavelets of different frequency bands. Experiments on two public EEG datasets demonstrated that our proposed MBMD Transformer outperformed several traditional machine learning and state-of-the-art deep learning approaches. To our knowledge, this is the first work on knowledge distillation for EEG-based seizure subtype classification.

Multi-Branch Mutual-Distillation Transformer for EEG-Based Seizure Subtype Classification

TL;DR

This work tackles cross-subject EEG-based seizure subtype classification under data scarcity. It introduces the MBMD Transformer, which integrates multi-branch encoder blocks with a mutual-distillation objective that transfers knowledge between raw EEG data and six wavelet-based frequency bands obtained via Wavelet Packet Decomposition, using a shared loss with a bi-directional KL term. Key contributions include the design of six-branch expert FFNs with branch-wise attention, a mutual-distillation loss with temperature , and a total objective (). On CHSZ and TUSZ, MBMD outperforms traditional ML and other deep models, demonstrating improved cross-subject seizure subtype classification and highlighting the practical potential for epilepsy diagnostics with limited labeled data.

Abstract

Cross-subject electroencephalogram (EEG) based seizure subtype classification is very important in precise epilepsy diagnostics. Deep learning is a promising solution, due to its ability to automatically extract latent patterns. However, it usually requires a large amount of training data, which may not always be available in clinical practice. This paper proposes Multi-Branch Mutual-Distillation (MBMD) Transformer for cross-subject EEG-based seizure subtype classification, which can be effectively trained from small labeled data. MBMD Transformer replaces all even-numbered encoder blocks of the vanilla Vision Transformer by our designed multi-branch encoder blocks. A mutual-distillation strategy is proposed to transfer knowledge between the raw EEG data and its wavelets of different frequency bands. Experiments on two public EEG datasets demonstrated that our proposed MBMD Transformer outperformed several traditional machine learning and state-of-the-art deep learning approaches. To our knowledge, this is the first work on knowledge distillation for EEG-based seizure subtype classification.

Paper Structure

This paper contains 19 sections, 8 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: (a) Knowledge distillation; and, (b) mutual learning.
  • Figure 2: Self-distillation strategies. (a) Data augmentation; and, (b) auxiliary structure.
  • Figure 3: A vanilla ViT for EEG signal classification.
  • Figure 4: MBMD Transformer with mutual-distillation. (a) Training and test; (b) the overall structure; and, (c) auxiliary data processing (take $\delta$ wave as an example).
  • Figure 5: WPD of 128 Hz EEG signal.
  • ...and 5 more figures