Table of Contents
Fetching ...

DIVER-1 : Deep Integration of Vast Electrophysiological Recordings at Scale

Danny Dongyeop Han, Yonghyeon Gwon, Ahhyun Lucy Lee, Taeyang Lee, Seong Jin Lee, Jubin Choi, Sebin Lee, Jihyun Bang, Seungju Lee, David Keetae Park, Shinjae Yoo, Chun Kee Chung, Jiook Cha

TL;DR

This work tackles scaling electrophysiology foundation models (EFMs) for EEG and iEEG, domains where data are scarce and heterogeneous. It introduces DIVER-1, a family of self-supervised EFMs with architectural innovations such as any-variate attention, sliding temporal conditional positional encoding, spatio-temporal register tokens, and multi-domain reconstruction, trained on the largest electrophysiology corpora to date. The authors demonstrate data-constrained scaling laws for EFMs, showing that, at fixed compute, smaller models trained longer achieve better performance, and provide IsoLoss guidance for compute budgeting. DIVER-1 achieves state-of-the-art downstream decoding on iEEG and EEG benchmarks and exhibits robust generalization across modalities and dataset shifts, with ablations confirming the value of each architectural component. The study offers practical guidance for scaling EFMs and points to future directions in cross-subject learning and data-efficient fine-tuning to broaden applicability in neuroscience and clinical contexts.

Abstract

Electrophysiology signals such as EEG and iEEG are central to neuroscience, brain-computer interfaces, and clinical applications, yet existing foundation models remain limited in scale despite clear evidence that scaling improves performance. We introduce DIVER-1, a family of EEG and iEEG foundation models trained on the largest and most diverse corpus to date-5.3k hours of iEEG and 54k hours of EEG (1.6M channel-hours from over 17.7k subjects)-and scaled up to 1.82B parameters. We present the first systematic scaling law analysis for this domain, showing that they follow data-constrained scaling laws: for a given amount of data and compute, smaller models trained for extended epochs consistently outperform larger models trained briefly. This behavior contrasts with prior electrophysiology foundation models that emphasized model size over training duration. To achieve strong performance, we also design architectural innovations including any-variate attention, sliding temporal conditional positional encoding, and multi-domain reconstruction. DIVER-1 iEEG and EEG models each achieve state-of-the-art performance on their respective benchmarks, establishing a concrete guidelines for efficient scaling and resource allocation in electrophysiology foundation model development.

DIVER-1 : Deep Integration of Vast Electrophysiological Recordings at Scale

TL;DR

This work tackles scaling electrophysiology foundation models (EFMs) for EEG and iEEG, domains where data are scarce and heterogeneous. It introduces DIVER-1, a family of self-supervised EFMs with architectural innovations such as any-variate attention, sliding temporal conditional positional encoding, spatio-temporal register tokens, and multi-domain reconstruction, trained on the largest electrophysiology corpora to date. The authors demonstrate data-constrained scaling laws for EFMs, showing that, at fixed compute, smaller models trained longer achieve better performance, and provide IsoLoss guidance for compute budgeting. DIVER-1 achieves state-of-the-art downstream decoding on iEEG and EEG benchmarks and exhibits robust generalization across modalities and dataset shifts, with ablations confirming the value of each architectural component. The study offers practical guidance for scaling EFMs and points to future directions in cross-subject learning and data-efficient fine-tuning to broaden applicability in neuroscience and clinical contexts.

Abstract

Electrophysiology signals such as EEG and iEEG are central to neuroscience, brain-computer interfaces, and clinical applications, yet existing foundation models remain limited in scale despite clear evidence that scaling improves performance. We introduce DIVER-1, a family of EEG and iEEG foundation models trained on the largest and most diverse corpus to date-5.3k hours of iEEG and 54k hours of EEG (1.6M channel-hours from over 17.7k subjects)-and scaled up to 1.82B parameters. We present the first systematic scaling law analysis for this domain, showing that they follow data-constrained scaling laws: for a given amount of data and compute, smaller models trained for extended epochs consistently outperform larger models trained briefly. This behavior contrasts with prior electrophysiology foundation models that emphasized model size over training duration. To achieve strong performance, we also design architectural innovations including any-variate attention, sliding temporal conditional positional encoding, and multi-domain reconstruction. DIVER-1 iEEG and EEG models each achieve state-of-the-art performance on their respective benchmarks, establishing a concrete guidelines for efficient scaling and resource allocation in electrophysiology foundation model development.

Paper Structure

This paper contains 48 sections, 13 equations, 13 figures, 26 tables.

Figures (13)

  • Figure 1: Overview of DIVER-1 architecture and pretraining. DIVER-1 is pretrained on a large EEG and iEEG data corpus. After preprocessing, input patches are randomly masked and enhanced by adding modality, spectral, and CNN-based patch embeddings, along with STCPE. The enhanced patches are processed through MOIRAI blocks and trained to reconstruct missing patches across multiple signal domains (time series, spectrum, spectrogram). The pretrained model is then applied to diverse downstream tasks.
  • Figure 2: Scaling laws and downstream performance of DIVER-1. (a-h) Scaling law validation: DIVER-1 follows data-constrained scaling laws across four dimensions for iEEG (a-d) and EEG (e-h) modalities. Loss decreases predictably with increased (a,e) compute (training FLOPs), (b,f) dataset size (number of tokens), (c,g) model size (parameters), and (d,h) training epochs, with strong log-log fits. iEEG experiments (a-d) used 100% of the dataset, while EEG experiments used 20% of the dataset for (e,h) and 100% for (f,g). (q-t) Downstream performance: Performance for iEEG (i-l) and EEG (m-o) across increasing (i) number of subjects while keeping dataset size identical, (j,m) dataset size (k,n) model sizes and (l,o) epochs. (p) Compute-Optimal Frontier (IsoLoss analysis): Comparison between empirical isolation loss contours and predicted isolation loss contours, with model configurations plotted to show the relationship between training epochs and model parameters under fixed compute budgets. (q) Neuroprobe benchmark results Comprehensive performance (AUROC) comparison across multiple neural decoding tasks, with $\textsc{DIVER}_{\text{Tiny}/\text{I}/0.1s}$ achieving state-of-the-art or competitive results on most tasks. $\textsc{DIVER}_{\text{Tiny}/\text{I}/0.1s}$ with $d_{\text{model}}=256$ and patch size 0.1s was pretrained on iEEG dataset for 32 epochs, past the compute optimal frontier for best performance. Performance with linear probing (red) and full finetuning (blue) are shown. (r, s) iEEG downstream performance (r) Neuroprobe multi-label classification results using $\textsc{DIVER}_{\text{Small}/\text{I}/0.1s}$. (s) MAYO(seizure detection task) results using $\textsc{DIVER}_{\text{Small}/\text{I}/1s}$ (t). EEG downstream performance: DIVER-1 showed competitive performance compared to other EEG foundation models (CBraMod and LaBraM-base) on the FACED, PhysioNet-MI, and MentalArithmetic datasets. Results shown are obtained using full finetuning. The DIVER model refers to $\textsc{DIVER}_{\text{Small}/\text{IE}/1s}$ with $d_{\text{model}}=512$ and patch size 1s pretrained on iEEG and EEG datasets for 16 epochs. Other baseline results are replicated using their official code. Performance values for CBraMod and LaBraM are reported from their original publications.
  • Figure 3: Verification of the $\mu P$ implementation. The L1 norm of activation vectors (y-axis) is plotted against model width (x-axis) for five training timesteps (t=1 to t=5) across four different widths ($256, 512, 768, 1024$). (Top Row) With standard parameterization, activation norms are unstable and diverge as model width increases. (Bottom Row) In contrast, our $\mu$P implementation yields stable activation norms that are independent of model width. This confirms the model is correctly parameterized, a critical prerequisite for successful hyperparameter transfer via $\mu$Transfer.
  • Figure 4: Loss curves of the $\textsc{DIVER}_{\text{-}/\text{I}/1s}$ model family. Test loss across epochs is shown.
  • Figure 5: Loss curves of the $\textsc{DIVER}_{\text{-}/\text{I}/0.1s}$ model family. Test loss across epochs is shown.
  • ...and 8 more figures