Riemannian Geometry-Preserving Variational Autoencoder for MI-BCI Data Augmentation

Viktorija Poļaka; Ivo Pascal de Jong; Andreea Ioana Sburlea

Riemannian Geometry-Preserving Variational Autoencoder for MI-BCI Data Augmentation

Viktorija Poļaka, Ivo Pascal de Jong, Andreea Ioana Sburlea

TL;DR

The RGP-VAE is introduced and validates the RGP-VAE as a geometry-preserving generative model for EEG covariance matrices, highlighting its potential for signal privacy, scalability and data augmentation.

Abstract

This paper addresses the challenge of generating synthetic electroencephalogram (EEG) covariance matrices for motor imagery brain-computer interface (MI-BCI) applications. Objective: We aim to develop a generative model capable of producing high-fidelity synthetic covariance matrices while preserving their symmetric positive-definite nature. Approach: We propose a Riemannian geometry-preserving variational autoencoder (RGP-VAE) integrating geometric mappings with a composite loss function combining Riemannian distance, tangent space reconstruction accuracy and generative diversity. Results: The model generates valid, representative EEG covariance matrices, while learning a subject-invariant latent space. Synthetic data proves practically useful for MI-BCI, with its impact depending on the paired classifier. Contribution: This work introduces and validates the RGP-VAE as a geometry-preserving generative model for EEG covariance matrices, highlighting its potential for signal privacy, scalability and data augmentation.

Riemannian Geometry-Preserving Variational Autoencoder for MI-BCI Data Augmentation

TL;DR

Abstract

Paper Structure (9 equations, 4 figures, 2 tables)

This paper contains 9 equations, 4 figures, 2 tables.

Figures (4)

Figure 1: An overview of the proposed RGP-VAE, illustrating the integration of a standard VAE with geometric operations on the SPD manifold. An input SPD matrix $\mathbf{X}_i$ is first projected onto the tangent space at a reference point $\mathbf{P}_{\text{ref}}$ using the logarithmic map $\text{log}_{\mathbf{P}_{\text{ref}}}$ (Eq. \ref{['eq:log_map']}). This tangent representation $\mathbf{S}_i$ is then vectorized to serve as the encoder input $\mathbf{H}_{\text{tangent}}$. The encoder maps this input to a latent distribution parameterized by $\boldsymbol{\mu}$ and $\text{log}(\boldsymbol{\sigma}^2)$, from which a latent vector $\mathbf{z}_i$ is sampled and passed to the decoder to produce the reconstructed vector $\mathbf{H}_{\text{decoded}}$. The vector is unvectorized back into a tangent space representation $\hat{\mathbf{S}}_i$, which is finally mapped back onto the SPD manifold via the exponential map ($\text{exp}_{\mathbf{P}_{\text{ref}}}$) (Eq. \ref{['eq:exp_map']}) to produce the reconstructed SPD matrix $\hat{\mathbf{X}}_i$.
Figure 2: 2D UMAP visualization of the latent space of the RGP-VAE for right-hand movement data. Points are colored by Subject ID; their significant overlap indicates the learning of a subject-invariant representation.
Figure 3: Distribution of accuracy improvement for each classifier using the prior generator. The plot shows the percentage point difference between the 'Augmented' and 'Synthetic-Only' conditions relative to the 'Baseline' across all subjects. The red line signifies the mean whilst the blue line is the median.
Figure 4: Distribution of accuracy improvement for each classifier using the posterior generator, showing similar trends to the prior generator but with more pronounced fluctuations.