Table of Contents
Fetching ...

Physics-informed and Unsupervised Riemannian Domain Adaptation for Machine Learning on Heterogeneous EEG Datasets

Apolline Mellot, Antoine Collas, Sylvain Chevallier, Denis Engemann, Alexandre Gramfort

TL;DR

This work proposes an unsupervised approach leveraging EEG signal physics to map EEG channels to fixed positions using field interpolation, facilitating source-free domain adaptation and demonstrating robust performance in brain-computer interface (BCI) tasks and potential biomarker applications.

Abstract

Combining electroencephalogram (EEG) datasets for supervised machine learning (ML) is challenging due to session, subject, and device variability. ML algorithms typically require identical features at train and test time, complicating analysis due to varying sensor numbers and positions across datasets. Simple channel selection discards valuable data, leading to poorer performance, especially with datasets sharing few channels. To address this, we propose an unsupervised approach leveraging EEG signal physics. We map EEG channels to fixed positions using field interpolation, facilitating source-free domain adaptation. Leveraging Riemannian geometry classification pipelines and transfer learning steps, our method demonstrates robust performance in brain-computer interface (BCI) tasks and potential biomarker applications. Comparative analysis against a statistical-based approach known as Dimensionality Transcending, a signal-based imputation called ComImp, source-dependent methods, as well as common channel selection and spherical spline interpolation, was conducted with leave-one-dataset-out validation on six public BCI datasets for a right-hand/left-hand classification task. Numerical experiments show that in the presence of few shared channels in train and test, the field interpolation consistently outperforms other methods, demonstrating enhanced classification performance across all datasets. When more channels are shared, field interpolation was found to be competitive with other methods and faster to compute than source-dependent methods.

Physics-informed and Unsupervised Riemannian Domain Adaptation for Machine Learning on Heterogeneous EEG Datasets

TL;DR

This work proposes an unsupervised approach leveraging EEG signal physics to map EEG channels to fixed positions using field interpolation, facilitating source-free domain adaptation and demonstrating robust performance in brain-computer interface (BCI) tasks and potential biomarker applications.

Abstract

Combining electroencephalogram (EEG) datasets for supervised machine learning (ML) is challenging due to session, subject, and device variability. ML algorithms typically require identical features at train and test time, complicating analysis due to varying sensor numbers and positions across datasets. Simple channel selection discards valuable data, leading to poorer performance, especially with datasets sharing few channels. To address this, we propose an unsupervised approach leveraging EEG signal physics. We map EEG channels to fixed positions using field interpolation, facilitating source-free domain adaptation. Leveraging Riemannian geometry classification pipelines and transfer learning steps, our method demonstrates robust performance in brain-computer interface (BCI) tasks and potential biomarker applications. Comparative analysis against a statistical-based approach known as Dimensionality Transcending, a signal-based imputation called ComImp, source-dependent methods, as well as common channel selection and spherical spline interpolation, was conducted with leave-one-dataset-out validation on six public BCI datasets for a right-hand/left-hand classification task. Numerical experiments show that in the presence of few shared channels in train and test, the field interpolation consistently outperforms other methods, demonstrating enhanced classification performance across all datasets. When more channels are shared, field interpolation was found to be competitive with other methods and faster to compute than source-dependent methods.
Paper Structure (16 sections, 6 equations, 3 figures)

This paper contains 16 sections, 6 equations, 3 figures.

Figures (3)

  • Figure 1: Top: table of diverse characteristics of the datasets. Bottom: 2D projection of sensor positions on the scalp.
  • Figure 2: Processing pipeline of EEG data. Depending on the method, dimensions are matched either when data are represented as epochs for interpolations or as covariances for DT. When there is no alignment performed, the Re-Center step is removed.
  • Figure 3: Accuracy for different dimensionality matching methods on three left-out datasets. One column (across panels A & B) corresponds to one target dataset. (A) Comparative learning curves for an increasing number of target channels seen during training with FI performance as reference. The increasing number of seen target channels is obtained by gradually including datasets in the train set, which is specified in the x-axis. The error bar represents the 95% confidence interval over the target subjects' performance. (B) Boxplot of accuracies when the classifier is trained on the five other datasets. One point corresponds to one subject of the target dataset. A black line represents the median of the box and the mean by a white circle. The black lines indicate the chance level. The stars represent the results of a Wilcoxon test (ns: $p > 5\mathrm{e}{-2}$, *: $1\mathrm{e}{-2} < p \leq 5\mathrm{e}{-2}$, ****: $p \leq 1\mathrm{e}{-4}$).