Table of Contents
Fetching ...

BELT-2: Bootstrapping EEG-to-Language representation alignment for multi-task brain decoding

Jinzhao Zhou, Yiqun Duan, Fred Chang, Thomas Do, Yu-Kai Wang, Chin-Teng Lin

TL;DR

BELT-2 is the first work to innovatively adopt byte-pair encoding (BPE)-level EEG-language alignment and integrate multi-task training and decoding in the EEG domain, and is the first work in the field capable of decoding coherent and readable sentences from non-invasive brain signals.

Abstract

The remarkable success of large language models (LLMs) across various multi-modality applications is well established. However, integrating large language models with humans, or brain dynamics, remains relatively unexplored. In this paper, we introduce BELT-2, a pioneering multi-task model designed to enhance both encoding and decoding performance from EEG signals. To bolster the quality of the EEG encoder, BELT-2 is the first work to innovatively 1) adopt byte-pair encoding (BPE)-level EEG-language alignment and 2) integrate multi-task training and decoding in the EEG domain. Inspired by the idea of \textbf{\textit{Bridging the Brain with GPT}}, we further connect the multi-task EEG encoder with LLMs by utilizing prefix-tuning on intermediary output from the EEG encoder. These innovative efforts make BELT-2 a pioneering breakthrough, making it the first work in the field capable of decoding coherent and readable sentences from non-invasive brain signals. Our experiments highlight significant advancements over prior techniques in both quantitative and qualitative measures, achieving a decoding performance with a BLEU-1 score of 52.2\% on the ZuCo dataset. Furthermore, BELT-2 shows a remarkable improvement ranging from 31\% to 162\% on other translation benchmarks. Codes can be accessed via the provided anonymous link~\footnote{https://anonymous.4open.science/r/BELT-2-0048}.

BELT-2: Bootstrapping EEG-to-Language representation alignment for multi-task brain decoding

TL;DR

BELT-2 is the first work to innovatively adopt byte-pair encoding (BPE)-level EEG-language alignment and integrate multi-task training and decoding in the EEG domain, and is the first work in the field capable of decoding coherent and readable sentences from non-invasive brain signals.

Abstract

The remarkable success of large language models (LLMs) across various multi-modality applications is well established. However, integrating large language models with humans, or brain dynamics, remains relatively unexplored. In this paper, we introduce BELT-2, a pioneering multi-task model designed to enhance both encoding and decoding performance from EEG signals. To bolster the quality of the EEG encoder, BELT-2 is the first work to innovatively 1) adopt byte-pair encoding (BPE)-level EEG-language alignment and 2) integrate multi-task training and decoding in the EEG domain. Inspired by the idea of \textbf{\textit{Bridging the Brain with GPT}}, we further connect the multi-task EEG encoder with LLMs by utilizing prefix-tuning on intermediary output from the EEG encoder. These innovative efforts make BELT-2 a pioneering breakthrough, making it the first work in the field capable of decoding coherent and readable sentences from non-invasive brain signals. Our experiments highlight significant advancements over prior techniques in both quantitative and qualitative measures, achieving a decoding performance with a BLEU-1 score of 52.2\% on the ZuCo dataset. Furthermore, BELT-2 shows a remarkable improvement ranging from 31\% to 162\% on other translation benchmarks. Codes can be accessed via the provided anonymous link~\footnote{https://anonymous.4open.science/r/BELT-2-0048}.
Paper Structure (27 sections, 11 equations, 11 figures, 10 tables)

This paper contains 27 sections, 11 equations, 11 figures, 10 tables.

Figures (11)

  • Figure 1: Overview of BELT-2. The first work of multi-task brain decoding by bridging the Q-Conformer EEG encoder and LLMs. Provided samples also suggest BELT-2 is the first to achieve fluent sentence decoding results from noninvasive brain signals.
  • Figure 2: The overall structure of the Q-Conformer. It consists of a discrete conformer, a context transformer (C-Former), and a query prompt. The input EEG embeddings (EEG embed) are first processed by the conformer into continuous EEG tokens. A vector quantizer is then used to discretize the EEG tokens. Then, a query prompt interacts with the discrete EEG token via the cross-attention layer from in the C-Former to extract task-specific context information from the discrete EEG tokens.
  • Figure 3: BELT-2's two-stage training schema. For EEG-to-language alignment learning (left), we jointly optimize three objectives that firmly establish the EEG-to-language alignment and enforce the query prompt to extract the EEG context most relevant to a task. For bridging of Q-Conformer and LLM (right), connect a frozen EEG model (Q-Conformer) and a frozen LLM by tuning the continuous virtual prefix using the prefix-tuning method. Speculative augmentation is used to boost the performance of the prefix-tuning process.
  • Figure 4: The illustration of BPE-level contrastive learning.
  • Figure 5: For multi-task training, we train three tasks simultaneously by randomly sampling tasks for each training iteration. Each task-specific query prompt learns to provide task-specific instructions by training on the corresponding task-specific objective function.
  • ...and 6 more figures