MEBM-Speech: Multi-scale Enhanced BrainMagic for Robust MEG Speech Detection

Li Songyi; Zheng Linze; Liang Jinghua; Zhang Zifeng

MEBM-Speech: Multi-scale Enhanced BrainMagic for Robust MEG Speech Detection

Li Songyi, Zheng Linze, Liang Jinghua, Zhang Zifeng

TL;DR

MEBM-Speech performs continuous probabilistic decoding of MEG signals, enabling fine-grained detection of speech versus silence states - an ability crucial for both cognitive neuroscience and clinical applications.

Abstract

We propose MEBM-Speech, a multi-scale enhanced neural decoder for speech activity detection from non-invasive magnetoencephalography (MEG) signals. Built upon the BrainMagic backbone, MEBM-Speech integrates three complementary temporal modeling mechanisms: a multi-scale convolutional module for short-term pattern extraction, a bidirectional LSTM (BiLSTM) for long-range context modeling, and a depthwise separable convolutional layer for efficient cross-scale feature fusion. A lightweight temporal jittering strategy and average pooling further improve onset robustness and boundary stability. The model performs continuous probabilistic decoding of MEG signals, enabling fine-grained detection of speech versus silence states - an ability crucial for both cognitive neuroscience and clinical applications. Comprehensive evaluations on the LibriBrain Competition 2025 Track1 benchmark demonstrate strong performance, achieving an average F1 macro of 89.3% on the validation set and comparable results on the official test leaderboard. These findings highlight the effectiveness of multi-scale temporal representation learning for robust MEG-based speech decoding.

MEBM-Speech: Multi-scale Enhanced BrainMagic for Robust MEG Speech Detection

TL;DR

Abstract

Paper Structure (8 sections, 1 figure, 1 table)

This paper contains 8 sections, 1 figure, 1 table.

Introduction
Methods
Decoding Strategy
Model Architecture
Experiments
Experimental Setup
Results and Ablation
Conclusion

Figures (1)

Figure 1: Overall architecture of the proposed MEBM-Speech model. (a) The complete processing pipeline. (b) The spatial attention module enhances sensor-level representations by learning spatial relevance weights across MEG channels. (c) The BM encoder extracts mid-term contextual features from spatially weighted signals. (d) The short-term multi-scale convolutional module captures fine-grained temporal dependencies using multiple receptive fields. (e) The depthwise separable convolutional layer further refines temporal representations with lightweight channel-wise and pointwise filtering.

MEBM-Speech: Multi-scale Enhanced BrainMagic for Robust MEG Speech Detection

TL;DR

Abstract

MEBM-Speech: Multi-scale Enhanced BrainMagic for Robust MEG Speech Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (1)