MAD: Multi-Alignment MEG-to-Text Decoding
Yiqian Yang, Hyejeong Jo, Yiqun Duan, Qiang Zhang, Jinni Zhou, Xuming Hu, Won Hee Lee, Renjing Xu, Hui Xiong
TL;DR
This work addresses open-vocabulary MEG-to-text decoding by introducing MAD, an end-to-end framework that aligns MEG signals with speech representations at multiple levels. By leveraging a dual-stream architecture and a composite loss that couples acoustic, semantic, and textual signals, the method learns robust mappings from brain activity to unseen text without relying on teacher forcing. Experiments on the GWilliams dataset show that semantic-level alignment is crucial, achieving a BLEU-1 of 6.86 without teacher forcing and substantially outperforming EEG/MEG baselines, with further gains when employing teacher forcing. The results suggest that aligning brain activity with intermediate speech representations can generalize better to novel linguistic content, offering a promising direction for non-invasive BCI communication systems.
Abstract
Deciphering language from brain activity is a crucial task in brain-computer interface (BCI) research. Non-invasive cerebral signaling techniques including electroencephalography (EEG) and magnetoencephalography (MEG) are becoming increasingly popular due to their safety and practicality, avoiding invasive electrode implantation. However, current works under-investigated three points: 1) a predominant focus on EEG with limited exploration of MEG, which provides superior signal quality; 2) poor performance on unseen text, indicating the need for models that can better generalize to diverse linguistic contexts; 3) insufficient integration of information from other modalities, which could potentially constrain our capacity to comprehensively understand the intricate dynamics of brain activity. This study presents a novel approach for translating MEG signals into text using a speech-decoding framework with multiple alignments. Our method is the first to introduce an end-to-end multi-alignment framework for totally unseen text generation directly from MEG signals. We achieve an impressive BLEU-1 score on the \textit{GWilliams} dataset, significantly outperforming the baseline from 5.49 to 6.86 on the BLEU-1 metric. This improvement demonstrates the advancement of our model towards real-world applications and underscores its potential in advancing BCI research. Code is available at $\href{https://github.com/NeuSpeech/MAD-MEG2text}{https://github.com/NeuSpeech/MAD-MEG2text}$.
