Table of Contents
Fetching ...

NeuroLM: A Universal Multi-task Foundation Model for Bridging the Gap between Language and EEG Signals

Wei-Bang Jiang, Yansen Wang, Bao-Liang Lu, Dongsheng Li

TL;DR

This work presents NeuroLM, a universal multi-task foundation model that treats EEG signals as a language for Large Language Models. It combines a text-aligned neural tokenizer trained by vector-quantized temporal-frequency prediction, multi-channel autoregressive pre-training with a frozen VQ encoder, and multi-task instruction tuning to unify diverse EEG tasks. Evaluated on six datasets, NeuroLM achieves competitive multi-task performance, with larger variants offering stronger gains and instruction tuning enabling generalization to novel prompts. The approach advances EEG-BCI by enabling a single model to handle multiple tasks with reduced task-specific fine-tuning, and points to future improvements through larger LLMs, mixture-of-experts, and finer EEG-text alignment.

Abstract

Recent advancements for large-scale pre-training with neural signals such as electroencephalogram (EEG) have shown promising results, significantly boosting the development of brain-computer interfaces (BCIs) and healthcare. However, these pre-trained models often require full fine-tuning on each downstream task to achieve substantial improvements, limiting their versatility and usability, and leading to considerable resource wastage. To tackle these challenges, we propose NeuroLM, the first multi-task foundation model that leverages the capabilities of Large Language Models (LLMs) by regarding EEG signals as a foreign language, endowing the model with multi-task learning and inference capabilities. Our approach begins with learning a text-aligned neural tokenizer through vector-quantized temporal-frequency prediction, which encodes EEG signals into discrete neural tokens. These EEG tokens, generated by the frozen vector-quantized (VQ) encoder, are then fed into an LLM that learns causal EEG information via multi-channel autoregression. Consequently, NeuroLM can understand both EEG and language modalities. Finally, multi-task instruction tuning adapts NeuroLM to various downstream tasks. We are the first to demonstrate that, by specific incorporation with LLMs, NeuroLM unifies diverse EEG tasks within a single model through instruction tuning. The largest variant NeuroLM-XL has record-breaking 1.7B parameters for EEG signal processing, and is pre-trained on a large-scale corpus comprising approximately 25,000-hour EEG data. When evaluated on six diverse downstream datasets, NeuroLM showcases the huge potential of this multi-task learning paradigm.

NeuroLM: A Universal Multi-task Foundation Model for Bridging the Gap between Language and EEG Signals

TL;DR

This work presents NeuroLM, a universal multi-task foundation model that treats EEG signals as a language for Large Language Models. It combines a text-aligned neural tokenizer trained by vector-quantized temporal-frequency prediction, multi-channel autoregressive pre-training with a frozen VQ encoder, and multi-task instruction tuning to unify diverse EEG tasks. Evaluated on six datasets, NeuroLM achieves competitive multi-task performance, with larger variants offering stronger gains and instruction tuning enabling generalization to novel prompts. The approach advances EEG-BCI by enabling a single model to handle multiple tasks with reduced task-specific fine-tuning, and points to future improvements through larger LLMs, mixture-of-experts, and finer EEG-text alignment.

Abstract

Recent advancements for large-scale pre-training with neural signals such as electroencephalogram (EEG) have shown promising results, significantly boosting the development of brain-computer interfaces (BCIs) and healthcare. However, these pre-trained models often require full fine-tuning on each downstream task to achieve substantial improvements, limiting their versatility and usability, and leading to considerable resource wastage. To tackle these challenges, we propose NeuroLM, the first multi-task foundation model that leverages the capabilities of Large Language Models (LLMs) by regarding EEG signals as a foreign language, endowing the model with multi-task learning and inference capabilities. Our approach begins with learning a text-aligned neural tokenizer through vector-quantized temporal-frequency prediction, which encodes EEG signals into discrete neural tokens. These EEG tokens, generated by the frozen vector-quantized (VQ) encoder, are then fed into an LLM that learns causal EEG information via multi-channel autoregression. Consequently, NeuroLM can understand both EEG and language modalities. Finally, multi-task instruction tuning adapts NeuroLM to various downstream tasks. We are the first to demonstrate that, by specific incorporation with LLMs, NeuroLM unifies diverse EEG tasks within a single model through instruction tuning. The largest variant NeuroLM-XL has record-breaking 1.7B parameters for EEG signal processing, and is pre-trained on a large-scale corpus comprising approximately 25,000-hour EEG data. When evaluated on six diverse downstream datasets, NeuroLM showcases the huge potential of this multi-task learning paradigm.
Paper Structure (27 sections, 8 equations, 11 figures, 12 tables)

This paper contains 27 sections, 8 equations, 11 figures, 12 tables.

Figures (11)

  • Figure 1: Comparison on six tasks.
  • Figure 2: The architecture design of text-aligned neural tokenizer training. The neural tokenizer is trained by reconstructing both temporal and frequency domain of input EEG signals to discretize them into discrete neural tokens. To align EEG and text embedding space, we utilize a domain classifier through adversarial training.
  • Figure 3: Schematic of NeuroLM training. Left: We first pre-train NeuroLM via multi-channel autoregression with EEG tokens output by the frozen VQ encoder. Right: The multi-task instruction tuning enables NeuroLM to perform various BCI tasks within a single model.
  • Figure 4: The stair-stepping mask. Each row indicates attention masks for an EEG token.
  • Figure 5: Ablation study on whether shuffling the options of instructions.
  • ...and 6 more figures