Table of Contents
Fetching ...

Generalizability Under Sensor Failure: Tokenization + Transformers Enable More Robust Latent Spaces

Geeling Chau, Yujin An, Ahamed Raffey Iqbal, Soon-Jo Chung, Yisong Yue, Sabera Talukder

TL;DR

Electroencephalography dataset is collected, two time series models are studied, TOTEM is studied, and it is found that TOTEM outperforms or matches EEGNet across all generalizability cases and that tokenization enables generalization.

Abstract

A major goal in neuroscience is to discover neural data representations that generalize. This goal is challenged by variability along recording sessions (e.g. environment), subjects (e.g. varying neural structures), and sensors (e.g. sensor noise), among others. Recent work has begun to address generalization across sessions and subjects, but few study robustness to sensor failure which is highly prevalent in neuroscience experiments. In order to address these generalizability dimensions we first collect our own electroencephalography dataset with numerous sessions, subjects, and sensors, then study two time series models: EEGNet (Lawhern et al., 2018) and TOTEM (Talukder et al., 2024). EEGNet is a widely used convolutional neural network, while TOTEM is a discrete time series tokenizer and transformer model. We find that TOTEM outperforms or matches EEGNet across all generalizability cases. Finally through analysis of TOTEM's latent codebook we observe that tokenization enables generalization.

Generalizability Under Sensor Failure: Tokenization + Transformers Enable More Robust Latent Spaces

TL;DR

Electroencephalography dataset is collected, two time series models are studied, TOTEM is studied, and it is found that TOTEM outperforms or matches EEGNet across all generalizability cases and that tokenization enables generalization.

Abstract

A major goal in neuroscience is to discover neural data representations that generalize. This goal is challenged by variability along recording sessions (e.g. environment), subjects (e.g. varying neural structures), and sensors (e.g. sensor noise), among others. Recent work has begun to address generalization across sessions and subjects, but few study robustness to sensor failure which is highly prevalent in neuroscience experiments. In order to address these generalizability dimensions we first collect our own electroencephalography dataset with numerous sessions, subjects, and sensors, then study two time series models: EEGNet (Lawhern et al., 2018) and TOTEM (Talukder et al., 2024). EEGNet is a widely used convolutional neural network, while TOTEM is a discrete time series tokenizer and transformer model. We find that TOTEM outperforms or matches EEGNet across all generalizability cases. Finally through analysis of TOTEM's latent codebook we observe that tokenization enables generalization.
Paper Structure (12 sections, 8 figures, 1 table)

This paper contains 12 sections, 8 figures, 1 table.

Figures (8)

  • Figure 1: Overview. TOTEM and EEGNet train on data from subject A1, which has no failed sensors. Both TOTEM and EEGNet are then tested on subject B2 with artificially failed sensors. This is an example of cross subject generalizability under sensor failure.
  • Figure 2: (a) Visualization of baseline (within session) and generalizability cases (cross subject, cross session, sensor failure) illustrated. (b) Experimental setup. We test on 2 human subjects each with 128 electrodes generating 600 trials per session. Adapted from garcia2023kcs
  • Figure 3: EEGNet architecture. Adapted from lawhern2018eegnet
  • Figure 4: TOTEM architecture and training. (a) Learn latent codebok via self-supervision. (b) Train transformers on tokenized data created from frozen codebook. Adapted from talukder2024totem
  • Figure 5: Classifier performance under all generalizability cases. (a) TOTEM vs EEGNet accuracy for all within session (black), cross session (red), cross subject (blue) cases across several amounts of sensor failure (0%, 10%, 30%, 70%). Ovals in 0% failure case represent standard error of mean (SEM) across 5 model random seeds. Ovals in 10%, 30%, 70% sensor failure plots represent SEM across 3 sensor failure random seeds. (b) Decoding accuracy of TOTEM (solid line) and EEGNet (dashed line) when trained on B1, and tested against within session (black), cross session (red), and cross subject (blue) cases across 0-100% sensor failure percentages. Additional performances can be found in Appendix \ref{['appendix:sensor_failure_detailed']}.
  • ...and 3 more figures