Deep Riemannian Networks for End-to-End EEG Decoding

Daniel Wilson; Robin Tibor Schirrmeister; Lukas Alexander Wilhelm Gemein; Tonio Ball

Deep Riemannian Networks for End-to-End EEG Decoding

Daniel Wilson, Robin Tibor Schirrmeister, Lukas Alexander Wilhelm Gemein, Tonio Ball

TL;DR

This study shows how to design and train DRNs to infer task-related information from the raw EEG without the need of handcrafted filterbanks and highlights the potential of end-to-end DRNs such as EE(G)-SPDNet for high-performance EEG decoding.

Abstract

State-of-the-art performance in electroencephalography (EEG) decoding tasks is currently often achieved with either Deep-Learning (DL) or Riemannian-Geometry-based decoders (RBDs). Recently, there is growing interest in Deep Riemannian Networks (DRNs) possibly combining the advantages of both previous classes of methods. However, there are still a range of topics where additional insight is needed to pave the way for a more widespread application of DRNs in EEG. These include architecture design questions such as network size and end-to-end ability. How these factors affect model performance has not been explored. Additionally, it is not clear how the data within these networks is transformed, and whether this would correlate with traditional EEG decoding. Our study aims to lay the groundwork in the area of these topics through the analysis of DRNs for EEG with a wide range of hyperparameters. Networks were tested on five public EEG datasets and compared with state-of-the-art ConvNets. Here we propose EE(G)-SPDNet, and we show that this wide, end-to-end DRN can outperform the ConvNets, and in doing so use physiologically plausible frequency regions. We also show that the end-to-end approach learns more complex filters than traditional band-pass filters targeting the classical alpha, beta, and gamma frequency bands of the EEG, and that performance can benefit from channel specific filtering approaches. Additionally, architectural analysis revealed areas for further improvement due to the possible under utilisation of Riemannian specific information throughout the network. Our study thus shows how to design and train DRNs to infer task-related information from the raw EEG without the need of handcrafted filterbanks and highlights the potential of end-to-end DRNs such as EE(G)-SPDNet for high-performance EEG decoding.

Deep Riemannian Networks for End-to-End EEG Decoding

TL;DR

Abstract

Paper Structure (68 sections, 26 equations, 30 figures, 6 tables)

This paper contains 68 sections, 26 equations, 30 figures, 6 tables.

Introduction
Methods
Outline
Riemannian Methods Overview
End-to-End EEG SPDNet
Optimised Filterbank EEG SPDNet
Network Design Choices
Optimiser
Network Width and Channel Specificity
SPD Matrix Operations
Vectorisation
Concatenation
Interband Covariance
Regularisation
Analyses
...and 53 more sections

Figures (30)

Figure 1: Diagram showing the architecture of the eespdnet. Input trials are convolved in a convolutional layer, which mimics bandpass filtering. An SCM pooling layer turns the convolved time series into sampled covariance matrices, which are spd. These matrices are then passed into an SPDNet to produce class predictions. The eespdnet shown here has had certain layers removed to improve clarity, therefore it should be noted that the eespdnet used for computations has 3 pairs of BiMap-ReEig layers, and that the BiMap and ReEig layers are separate layers. For more detail of the SPDNet components, see the work by huangRiemannianNetworkSPD2016.
Figure 2: Diagram showing the architecture of the fbspdnet. The training set is passed to a bo, where it is used to optimise the filterbank. The optimal filterbank is then used for the entire dataset. An SPD estimator (in this case the scm) is used to create SPD matrices, which are passed to the SPDNet for classification.
Figure 3: Final Evaluation Set Results. These figures show the results of our models (EEGSPDNet and FBSPDNet) and the comparison models on the evaluation datasets. Details regarding the comparison models we used can be found in Section \ref{['Sec: Methods/ComparisonModels']}, details regarding the datasets can be found in Section \ref{['Sec: Methods/Datasets']}. As discussed previously, the variants of our proposed models are an 8-filter, channel specific eespdnet (EEGSPDNet chspec) and a 6-filter channel independent fbspdnet (FBSPDNet ChInd). One the left is an estimation plot, which is itself made up of two subplots. The upper subplot is a swarm plot, with the models on x-axis, and the test-set classification accuracy on the y-axis. Each point represents a single participant training-testing loop from a single dataset (hue denotes which dataset) for a given model. To the left of a particular models swarm is the is a gapped line showing the swarm mean +/- standard deviation. There are 115 points in each swarm, which is the number of participants across all datasets. The lower subplot shows the bootstrapped ($n=10000$) mean difference between the left-most model (EEGSPDNet chspec) and every other model. The shaded area shows the distribution of the bootstrapped differences, with the dot and line respectively showing the mean an 95% confidence intervals. On the right is a heatmap displaying mean accuracy differences (not bootstrapped, as in the estimation plots) and asterisks for significance thresholds (one, two or three asterisks imply significance less than 0.05, 0.01 & 0.001, respectively). A cell in the heatmap represents the row minus column accuracy difference.
Figure 4: Validation Set Classification Accuracy Comparison For Different Conditions From left to right the subplots show: fbspdnet against eespdnet with marker denoting chind or chspec filtering, chind against chspec filtering with marker denoting eespdnet or fbspdnet and removed interband covariance against with interband covariance with markers denoting regular conv filtering or sinc-conv filtering. Data has been averaged across seeds and participants, and separated by number of filters (via colour). Means, standard deviations and p-values for the three subplots can be found in Tables \ref{['Tab: ScatterCompTable FBSPD vs EEGSPD']}, \ref{['Tab: ScatterCompTable Ind vs Spec']} and \ref{['Tab: ScatterCompTable Interband']}. The similarity between many p-values arises from the relatively small sample size (7 or 8 values) and that the ranked sums are often identical (i.e. all points are in favour of a particular condition).
Figure 5: Average Frequency Gain Caused by Learned Convolutional Filters of eespdnet. Each subplot shows the average frequency gain (in decibels) across the spectrum. The first three subplots (from top to bottom) show the different model sub-types, with colour denoting the number of filters. The bottom subplot shows the average (with standard deviation in shading) across all filters for each model sub-type, frequency gains were normalised between 0 - 1 to allow for better inter-model comparisons. All data used for the generation of this figure was collected during the validation phase.
...and 25 more figures

Deep Riemannian Networks for End-to-End EEG Decoding

TL;DR

Abstract

Deep Riemannian Networks for End-to-End EEG Decoding

Authors

TL;DR

Abstract

Table of Contents

Figures (30)