Table of Contents
Fetching ...

WaveMind: Towards a Conversational EEG Foundation Model Aligned to Textual and Visual Modalities

Ziyi Zeng, Zhenyang Cai, Yixi Cai, Xidong Wang, Junying Chen, Rongsheng Wang, Yipeng Liu, Siqi Cai, Benyou Wang, Zhiguo Zhang, Haizhou Li

TL;DR

WaveMind advances EEG research by unifying EEG with textual and visual modalities in a shared semantic space, enabling conversational interpretation of brain signals. It introduces WaveMind-Instruct-338k for instruction tuning and a three-stage training paradigm (Encoder Representation Alignment, Cold-Start for CLIP Space Adaptability, EEG Instruction Tuning) to achieve cross-modal grounding and dialogue capability. The model demonstrates robust classification and open-ended conversational performance across multiple brain tasks, with analyses highlighting complementarity between cognitive and brain-state signals. By releasing open data and benchmarks, this work provides a foundation for general-purpose EEG models and practical brain-computer interface research.

Abstract

Electroencephalography (EEG) interpretation using multimodal large language models (MLLMs) offers a novel approach for analyzing brain signals. However, the complex nature of brain activity introduces critical challenges: EEG signals simultaneously encode both cognitive processes and intrinsic neural states, creating a mismatch in EEG paired-data modality that hinders effective cross-modal representation learning. Through a pivot investigation, we uncover complementary relationships between these modalities. Leveraging this insight, we propose mapping EEG signals and their corresponding modalities into a unified semantic space to achieve generalized interpretation. To fully enable conversational capabilities, we further introduce WaveMind-Instruct-338k, the first cross-task EEG dataset for instruction tuning. The resulting model demonstrates robust classification accuracy while supporting flexible, open-ended conversations across four downstream tasks, thereby offering valuable insights for both neuroscience research and the development of general-purpose EEG models.

WaveMind: Towards a Conversational EEG Foundation Model Aligned to Textual and Visual Modalities

TL;DR

WaveMind advances EEG research by unifying EEG with textual and visual modalities in a shared semantic space, enabling conversational interpretation of brain signals. It introduces WaveMind-Instruct-338k for instruction tuning and a three-stage training paradigm (Encoder Representation Alignment, Cold-Start for CLIP Space Adaptability, EEG Instruction Tuning) to achieve cross-modal grounding and dialogue capability. The model demonstrates robust classification and open-ended conversational performance across multiple brain tasks, with analyses highlighting complementarity between cognitive and brain-state signals. By releasing open data and benchmarks, this work provides a foundation for general-purpose EEG models and practical brain-computer interface research.

Abstract

Electroencephalography (EEG) interpretation using multimodal large language models (MLLMs) offers a novel approach for analyzing brain signals. However, the complex nature of brain activity introduces critical challenges: EEG signals simultaneously encode both cognitive processes and intrinsic neural states, creating a mismatch in EEG paired-data modality that hinders effective cross-modal representation learning. Through a pivot investigation, we uncover complementary relationships between these modalities. Leveraging this insight, we propose mapping EEG signals and their corresponding modalities into a unified semantic space to achieve generalized interpretation. To fully enable conversational capabilities, we further introduce WaveMind-Instruct-338k, the first cross-task EEG dataset for instruction tuning. The resulting model demonstrates robust classification accuracy while supporting flexible, open-ended conversations across four downstream tasks, thereby offering valuable insights for both neuroscience research and the development of general-purpose EEG models.

Paper Structure

This paper contains 99 sections, 7 equations, 10 figures, 21 tables, 3 algorithms.

Figures (10)

  • Figure 1: Overall illustration of WaveMind with support for downstream tasksLeft: The model is compatible with more upstream data through shared space. Right: Example of EEG interpretation over various downstream tasks.
  • Figure 2: The Comparison of Two Typical Alignment Methods with Ours. The proposed method can effectively enhance the adaptability of upstream data and the generalization of downstream tasks.
  • Figure 3: Instruction Construction Pipeline of WaveMind. The raw EEG signals are first pre-processed into segments with the same configuration, then executed with different instruction construction processes depending on the type of labels. We have constructed four types of instructions to ensure the model learns diverse knowledge.
  • Figure 4: The Overall Architecture of WaveMind. Left: three-stage training procedure. Right: inference procedure of WaveMind. The system projects EEG data into a unified semantic space and integrates retrieval-augmented generation (RAG) for robust language generation.
  • Figure 5: Conversational Assessment with Varied Cue Granularity (A) Cognitive evaluation where WaveMind's responses are compared with image captions from visual stimuli. The "w/Obj" indicates that the model is provided with an object cue consisting of k possible options, including one correct answer, where this object cue is directly incorporated into the input prompts; (B) Brain State evaluation, where GPT-4o is adopted to determine whether WaveMind's responses contain the correct category from clinical annotation.
  • ...and 5 more figures