ENIGMA: EEG-to-Image in 15 Minutes Using Less Than 1% of the Parameters

Reese Kneeland; Wangshu Jiang; Ugo Bruzadin Nunes; Paul Steven Scotti; Arnaud Delorme; Jonathan Xu

ENIGMA: EEG-to-Image in 15 Minutes Using Less Than 1% of the Parameters

Reese Kneeland, Wangshu Jiang, Ugo Bruzadin Nunes, Paul Steven Scotti, Arnaud Delorme, Jonathan Xu

TL;DR

ENIGMA tackles the practical barriers of EEG-to-Image decoding by delivering a unified multi-subject pipeline that uses a shared spatio-temporal backbone, lightweight latent alignment layers, and a CLIP-based projection to image space, followed by diffusion-based reconstruction. With as little as 15 minutes of per-subject calibration and training on under 1% of the parameters compared to previous approaches, ENIGMA achieves state-of-the-art reconstructions on THINGS-EEG2 and Alljoined-1.6M across subjects and hardware qualities, while maintaining strong single-subject performance. Behavioral evaluations with 545 human raters confirm meaningful perceptual content in reconstructions, and ablations pinpoint the latent alignment and backbone as key for generalization. The work demonstrates a compelling path toward practical, edge-deployable EEG-to-Image BCIs, enabling rapid adaptation for assistive, clinical, and consumer applications, albeit with limitations in scaling beyond constrained image-reconstruction tasks and in further enlarging the performance ceiling with more subjects.

Abstract

To be practical for real-life applications, models for brain-computer interfaces must be easily and quickly deployable on new subjects, effective on affordable scanning hardware, and small enough to run locally on accessible computing resources. To directly address these current limitations, we introduce ENIGMA, a multi-subject electroencephalography (EEG)-to-Image decoding model that reconstructs seen images from EEG recordings and achieves state-of-the-art (SOTA) performance on the research-grade THINGS-EEG2 and consumer-grade AllJoined-1.6M benchmarks, while fine-tuning effectively on new subjects with as little as 15 minutes of data. ENIGMA boasts a simpler architecture and requires less than 1% of the trainable parameters necessary for previous approaches. Our approach integrates a subject-unified spatio-temporal backbone along with a set of multi-subject latent alignment layers and an MLP projector to map raw EEG signals to a rich visual latent space. We evaluate our approach using a broad suite of image reconstruction metrics that have been standardized in the adjacent field of fMRI-to-Image research, and we describe the first EEG-to-Image study to conduct extensive behavioral evaluations of our reconstructions using human raters. Our simple and robust architecture provides a significant performance boost across both research-grade and consumer-grade EEG hardware, and a substantial improvement in fine-tuning efficiency and inference cost. Finally, we provide extensive ablations to determine the architectural choices most responsible for our performance gains in both single and multi-subject cases across multiple benchmark datasets. Collectively, our work provides a substantial step towards the development of practical brain-computer interface applications.

ENIGMA: EEG-to-Image in 15 Minutes Using Less Than 1% of the Parameters

TL;DR

Abstract

Paper Structure (23 sections, 10 equations, 12 figures, 2 tables)

This paper contains 23 sections, 10 equations, 12 figures, 2 tables.

Introduction
Related Work
ENIGMA
Methodology
Datasets
Architecture
Results
Quantitative Evaluations
Human Behavioral Evaluations
Fine Tuning Efficiency
Ablation Study
Discussion
Current Limitations
Appendix
Data Processing and Format
...and 8 more sections

Figures (12)

Figure 1: A: ATM-S Li2024 vs B: ENIGMA (ours) comparison of model size and multi-subject capabilities on THINGS-EEG2 Gifford2022 (green cap, 10 subjects) and Alljoined-1.6M Alljoined-1.6M (red cap, 20 subjects) datasets. C: Comparison of ENIGMA and ATM-S across training data scale.
Figure 2: During training, brain activity from each subject is passed through a shared pathway of spatio-temporal convolutions, producing an intermediate latent vector $x_{\textrm{EEG}}$ from all subjects, this is then passed through a set of subject-specific latent alignment layers to produce an aligned latent embedding $z_{\textrm{EEG}}$. This latent is passed through a fully connected MLP projection layer to produce the output $c_{\textrm{EEG}}$ vector, which is reconstructed into an image using SDXL. Details of these procedures are provided in Section \ref{['architecture']}.
Figure 3: Qualitative comparison of reconstruction methods on seen stimuli from THINGS-EEG2 and Alljoined-1.6M. Reconstructions selected are the outputs sampled from each method and stimulus with the highest scores on all of the image feature metrics in Table \ref{['table:featuremetrics']}.
Figure 4: Scaling efficiency of ENIGMA with (red) and without (blue) pretraining on other subjects, and ATM-S Li2024 (green). Performance is plotted using varying amounts of target subject training/fine-tuning data on a log-scale X-axis. Reconstruction accuracy, evaluated using the normalized average of feature metrics presented in Table \ref{['table:featuremetrics']}, is plotted on the Y-axis. All metrics are calculated on the median subject (2) of the THINGS-EEG2 dataset.
Figure 5: Ablation analyses: model variants (numbered icons) in single (square) and multi-subject (circle) configurations under each ablation type (color) are assessed via the normalized average of all feature metrics (Table \ref{['table:featuremetrics']}), with THINGS-EEG2 performance on the x-axis and Alljoined-1.6M performance on the y-axis.
...and 7 more figures

ENIGMA: EEG-to-Image in 15 Minutes Using Less Than 1% of the Parameters

TL;DR

Abstract

ENIGMA: EEG-to-Image in 15 Minutes Using Less Than 1% of the Parameters

Authors

TL;DR

Abstract

Table of Contents

Figures (12)