WaveMind: Towards a Conversational EEG Foundation Model Aligned to Textual and Visual Modalities

Ziyi Zeng; Zhenyang Cai; Yixi Cai; Xidong Wang; Junying Chen; Rongsheng Wang; Yipeng Liu; Siqi Cai; Benyou Wang; Zhiguo Zhang; Haizhou Li

WaveMind: Towards a Conversational EEG Foundation Model Aligned to Textual and Visual Modalities

Ziyi Zeng, Zhenyang Cai, Yixi Cai, Xidong Wang, Junying Chen, Rongsheng Wang, Yipeng Liu, Siqi Cai, Benyou Wang, Zhiguo Zhang, Haizhou Li

TL;DR

WaveMind advances EEG research by unifying EEG with textual and visual modalities in a shared semantic space, enabling conversational interpretation of brain signals. It introduces WaveMind-Instruct-338k for instruction tuning and a three-stage training paradigm (Encoder Representation Alignment, Cold-Start for CLIP Space Adaptability, EEG Instruction Tuning) to achieve cross-modal grounding and dialogue capability. The model demonstrates robust classification and open-ended conversational performance across multiple brain tasks, with analyses highlighting complementarity between cognitive and brain-state signals. By releasing open data and benchmarks, this work provides a foundation for general-purpose EEG models and practical brain-computer interface research.

Abstract

Electroencephalography (EEG) interpretation using multimodal large language models (MLLMs) offers a novel approach for analyzing brain signals. However, the complex nature of brain activity introduces critical challenges: EEG signals simultaneously encode both cognitive processes and intrinsic neural states, creating a mismatch in EEG paired-data modality that hinders effective cross-modal representation learning. Through a pivot investigation, we uncover complementary relationships between these modalities. Leveraging this insight, we propose mapping EEG signals and their corresponding modalities into a unified semantic space to achieve generalized interpretation. To fully enable conversational capabilities, we further introduce WaveMind-Instruct-338k, the first cross-task EEG dataset for instruction tuning. The resulting model demonstrates robust classification accuracy while supporting flexible, open-ended conversations across four downstream tasks, thereby offering valuable insights for both neuroscience research and the development of general-purpose EEG models.

WaveMind: Towards a Conversational EEG Foundation Model Aligned to Textual and Visual Modalities

TL;DR

Abstract

WaveMind: Towards a Conversational EEG Foundation Model Aligned to Textual and Visual Modalities

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)