Table of Contents
Fetching ...

BrainRVQ: A High-Fidelity EEG Foundation Model via Dual-Domain Residual Quantization and Hierarchical Autoregression

Mingzhe Cui, Tao Chen, Yang Jiao, Yiqin Wang, Lei Xie, Yi Pan, Luca Mainardi

TL;DR

This work proposes BrainRVQ, a general-purpose EEG foundation model pre-trained on a large-scale corpus of clinical EEG data, featuring a Dual-Domain Residual Vector Quantization (DD-RVQ) tokenizer that disentangles temporal waveforms and spectral patterns into hierarchical discrete codes.

Abstract

Developing foundation models for electroencephalography (EEG) remains challenging due to the signal's low signal-to-noise ratio and complex spectro-temporal non-stationarity. Existing approaches often overlook the hierarchical latent structure inherent in neural dynamics, leading to suboptimal reconstruction of fine-grained information. In this work, we propose BrainRVQ, a general-purpose EEG foundation model pre-trained on a large-scale corpus of clinical EEG data. Unlike standard masked modeling, BrainRVQ features a Dual-Domain Residual Vector Quantization (DD-RVQ) tokenizer that disentangles temporal waveforms and spectral patterns into hierarchical discrete codes. We further introduce a hierarchical autoregressive pre-training objective that learns to reconstruct these codes in a coarse-to-fine manner, utilizing an importance-guided curriculum masking strategy to prioritize information-rich neural events over background noise. Extensive experiments across 8 diverse downstream datasets demonstrate that BrainRVQ consistently outperforms state-of-the-art baselines, validating its effectiveness in learning robust and generalizable neural representations. Our code and model weights are available:https://github.com/keqicmz/BrainRVQ

BrainRVQ: A High-Fidelity EEG Foundation Model via Dual-Domain Residual Quantization and Hierarchical Autoregression

TL;DR

This work proposes BrainRVQ, a general-purpose EEG foundation model pre-trained on a large-scale corpus of clinical EEG data, featuring a Dual-Domain Residual Vector Quantization (DD-RVQ) tokenizer that disentangles temporal waveforms and spectral patterns into hierarchical discrete codes.

Abstract

Developing foundation models for electroencephalography (EEG) remains challenging due to the signal's low signal-to-noise ratio and complex spectro-temporal non-stationarity. Existing approaches often overlook the hierarchical latent structure inherent in neural dynamics, leading to suboptimal reconstruction of fine-grained information. In this work, we propose BrainRVQ, a general-purpose EEG foundation model pre-trained on a large-scale corpus of clinical EEG data. Unlike standard masked modeling, BrainRVQ features a Dual-Domain Residual Vector Quantization (DD-RVQ) tokenizer that disentangles temporal waveforms and spectral patterns into hierarchical discrete codes. We further introduce a hierarchical autoregressive pre-training objective that learns to reconstruct these codes in a coarse-to-fine manner, utilizing an importance-guided curriculum masking strategy to prioritize information-rich neural events over background noise. Extensive experiments across 8 diverse downstream datasets demonstrate that BrainRVQ consistently outperforms state-of-the-art baselines, validating its effectiveness in learning robust and generalizable neural representations. Our code and model weights are available:https://github.com/keqicmz/BrainRVQ
Paper Structure (61 sections, 29 equations, 13 figures, 20 tables)

This paper contains 61 sections, 29 equations, 13 figures, 20 tables.

Figures (13)

  • Figure 1: The overall architecture of the BrainRVQ framework.(A) DDRVQ for EEG Tokenization: The module employs a DDRVQ mechanism to discretize EEG signals. It extracts features simultaneously in time and frequency domains and is optimized via a joint objective of reconstruction and embedding loss. (B) Pre-training Stage: We introduce a Hierarchical Autoregressive Masked Modeling objective. The model learns to predict residual tokens in a coarse-to-fine manner, guided by an Importance-Guided Curriculum Masking strategy that prioritizes high-information regions. (C) Downstream Adaptation: The pre-trained encoder serves as a general-purpose feature extractor.
  • Figure 2: Visualization of ablation study results. The proposed Full Model (dark blue) consistently outperforms single-domain and non-hierarchical variants across all datasets. Left: AUROC scores for Mental Workload and CHB-MIT. Right: Cohen's Kappa scores for TUEV and BCICIV-2a.
  • Figure 3: DD-RVQ tokenizer training dynamics. The total loss (black) comprises time-domain waveform reconstruction (blue), frequency-domain amplitude reconstruction (orange), and phase reconstruction (green).
  • Figure 4: Pre-training loss dynamics. Layer-0 (coarse) achieves consistently lower loss than Layer-1/2 (fine), validating the coarse-to-fine learning hierarchy.
  • Figure 5: Layer-wise token prediction accuracy during pre-training. Layer-0 achieves the highest accuracy ($\sim$29%), followed by Layer-1 ($\sim$22%) and Layer-2 ($\sim$14%), confirming that coarse codes are more predictable than fine-grained residuals.
  • ...and 8 more figures