NT-ViT: Neural Transcoding Vision Transformers for EEG-to-fMRI Synthesis

Romeo Lanzino; Federico Fontana; Luigi Cinque; Francesco Scarcello; Atsuto Maki

NT-ViT: Neural Transcoding Vision Transformers for EEG-to-fMRI Synthesis

Romeo Lanzino, Federico Fontana, Luigi Cinque, Francesco Scarcello, Atsuto Maki

TL;DR

A key feature of the Neural Transcoding Vision Transformer is its Domain Matching (DM) sub-module which effectively aligns the latent EEG representations with those of fMRI volumes, enhancing the model's accuracy and reliability.

Abstract

This paper introduces the Neural Transcoding Vision Transformer (\modelname), a generative model designed to estimate high-resolution functional Magnetic Resonance Imaging (fMRI) samples from simultaneous Electroencephalography (EEG) data. A key feature of \modelname is its Domain Matching (DM) sub-module which effectively aligns the latent EEG representations with those of fMRI volumes, enhancing the model's accuracy and reliability. Unlike previous methods that tend to struggle with fidelity and reproducibility of images, \modelname addresses these challenges by ensuring methodological integrity and higher-quality reconstructions which we showcase through extensive evaluation on two benchmark datasets; \modelname outperforms the current state-of-the-art by a significant margin in both cases, e.g. achieving a $10\times$ reduction in RMSE and a $3.14\times$ increase in SSIM on the Oddball dataset. An ablation study also provides insights into the contribution of each component to the model's overall effectiveness. This development is critical in offering a new approach to lessen the time and financial constraints typically linked with high-resolution brain imaging, thereby aiding in the swift and precise diagnosis of neurological disorders. Although it is not a replacement for actual fMRI but rather a step towards making such imaging more accessible, we believe that it represents a pivotal advancement in clinical practice and neuroscience research. Code is available at \url{https://github.com/rom42pla/ntvit}.

NT-ViT: Neural Transcoding Vision Transformers for EEG-to-fMRI Synthesis

TL;DR

Abstract

reduction in RMSE and a

increase in SSIM on the Oddball dataset. An ablation study also provides insights into the contribution of each component to the model's overall effectiveness. This development is critical in offering a new approach to lessen the time and financial constraints typically linked with high-resolution brain imaging, thereby aiding in the swift and precise diagnosis of neurological disorders. Although it is not a replacement for actual fMRI but rather a step towards making such imaging more accessible, we believe that it represents a pivotal advancement in clinical practice and neuroscience research. Code is available at \url{https://github.com/rom42pla/ntvit}.

Paper Structure (19 sections, 7 equations, 2 figures, 2 tables)

This paper contains 19 sections, 7 equations, 2 figures, 2 tables.

Introduction
Related work
Method
Generator
Domain Matching
Losses
Experiments
Experimental design
Datasets
Evaluation metrics and schemes
Implementation details
Results
Ablation study
Modules and losses
Optimization and regularization
...and 4 more sections

Figures (2)

Figure 1: (a) The model features two core modules called Generator and Domain Matching (DM). Initially, the Generator converts EEG waveforms into Mel spectrograms using a Spectrogrammer to encapsulate frequency information. An Encoder then processes these spectrograms, extracting a latent representation that a Decoder employs to reconstruct the fMRI volume. The DM module, active only during training, optimizes the Generator's efficacy by extracting a latent representation of the actual fMRI volume which is aligned with the latent EEG representation. The dashed lines indicate where losses are calculated. (b) The Encoder processes input volumes (either fMRI or spectrograms) by dividing them into 3D patches, converting these into tokens, and processing them through a Transformer to yield a single token representing the input's latent features. (c) The Decoder uses a single token to condition the input through a Transformer that uses it to generate a volume.
Figure 2: Examples of reconstructions made by the two configurations of our model NT-ViT. The results are represented in both PC and MIP formats. In both visual representations, areas appearing redder indicate higher oxygenation levels and greater activity, whereas the bluer regions signify lower oxygenation and reduced activity.

NT-ViT: Neural Transcoding Vision Transformers for EEG-to-fMRI Synthesis

TL;DR

Abstract

NT-ViT: Neural Transcoding Vision Transformers for EEG-to-fMRI Synthesis

Authors

TL;DR

Abstract

Table of Contents

Figures (2)