Table of Contents
Fetching ...

MAD: Multi-Alignment MEG-to-Text Decoding

Yiqian Yang, Hyejeong Jo, Yiqun Duan, Qiang Zhang, Jinni Zhou, Xuming Hu, Won Hee Lee, Renjing Xu, Hui Xiong

TL;DR

This work addresses open-vocabulary MEG-to-text decoding by introducing MAD, an end-to-end framework that aligns MEG signals with speech representations at multiple levels. By leveraging a dual-stream architecture and a composite loss that couples acoustic, semantic, and textual signals, the method learns robust mappings from brain activity to unseen text without relying on teacher forcing. Experiments on the GWilliams dataset show that semantic-level alignment is crucial, achieving a BLEU-1 of 6.86 without teacher forcing and substantially outperforming EEG/MEG baselines, with further gains when employing teacher forcing. The results suggest that aligning brain activity with intermediate speech representations can generalize better to novel linguistic content, offering a promising direction for non-invasive BCI communication systems.

Abstract

Deciphering language from brain activity is a crucial task in brain-computer interface (BCI) research. Non-invasive cerebral signaling techniques including electroencephalography (EEG) and magnetoencephalography (MEG) are becoming increasingly popular due to their safety and practicality, avoiding invasive electrode implantation. However, current works under-investigated three points: 1) a predominant focus on EEG with limited exploration of MEG, which provides superior signal quality; 2) poor performance on unseen text, indicating the need for models that can better generalize to diverse linguistic contexts; 3) insufficient integration of information from other modalities, which could potentially constrain our capacity to comprehensively understand the intricate dynamics of brain activity. This study presents a novel approach for translating MEG signals into text using a speech-decoding framework with multiple alignments. Our method is the first to introduce an end-to-end multi-alignment framework for totally unseen text generation directly from MEG signals. We achieve an impressive BLEU-1 score on the \textit{GWilliams} dataset, significantly outperforming the baseline from 5.49 to 6.86 on the BLEU-1 metric. This improvement demonstrates the advancement of our model towards real-world applications and underscores its potential in advancing BCI research. Code is available at $\href{https://github.com/NeuSpeech/MAD-MEG2text}{https://github.com/NeuSpeech/MAD-MEG2text}$.

MAD: Multi-Alignment MEG-to-Text Decoding

TL;DR

This work addresses open-vocabulary MEG-to-text decoding by introducing MAD, an end-to-end framework that aligns MEG signals with speech representations at multiple levels. By leveraging a dual-stream architecture and a composite loss that couples acoustic, semantic, and textual signals, the method learns robust mappings from brain activity to unseen text without relying on teacher forcing. Experiments on the GWilliams dataset show that semantic-level alignment is crucial, achieving a BLEU-1 of 6.86 without teacher forcing and substantially outperforming EEG/MEG baselines, with further gains when employing teacher forcing. The results suggest that aligning brain activity with intermediate speech representations can generalize better to novel linguistic content, offering a promising direction for non-invasive BCI communication systems.

Abstract

Deciphering language from brain activity is a crucial task in brain-computer interface (BCI) research. Non-invasive cerebral signaling techniques including electroencephalography (EEG) and magnetoencephalography (MEG) are becoming increasingly popular due to their safety and practicality, avoiding invasive electrode implantation. However, current works under-investigated three points: 1) a predominant focus on EEG with limited exploration of MEG, which provides superior signal quality; 2) poor performance on unseen text, indicating the need for models that can better generalize to diverse linguistic contexts; 3) insufficient integration of information from other modalities, which could potentially constrain our capacity to comprehensively understand the intricate dynamics of brain activity. This study presents a novel approach for translating MEG signals into text using a speech-decoding framework with multiple alignments. Our method is the first to introduce an end-to-end multi-alignment framework for totally unseen text generation directly from MEG signals. We achieve an impressive BLEU-1 score on the \textit{GWilliams} dataset, significantly outperforming the baseline from 5.49 to 6.86 on the BLEU-1 metric. This improvement demonstrates the advancement of our model towards real-world applications and underscores its potential in advancing BCI research. Code is available at .
Paper Structure (12 sections, 4 equations, 2 figures, 4 tables)

This paper contains 12 sections, 4 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: (a) Overview of the MAD architecture. Our model employs a dual-stream design for multi-level alignment between MEG and speech modalities. Alignments are enforced at the level of Mel spectrograms ($M_1, M_2$), encoder hidden states ($E_1, E_2$), and output text ($T_1, T_2$). (b) Detailed architecture of the Brain Module (adapted from D_fossez_2023_meg_eeg_clip_pretrain_meta_brain2speech), which transforms raw MEG signals ($\varepsilon$) into a predicted Mel spectrogram ($M_1$).
  • Figure 2: Comparison of ground truth and predicted Mel spectrograms for two test samples. The model captures the overall structure and temporal dynamics (e.g., speech pauses) of the audio.