Towards an End-to-End Framework for Invasive Brain Signal Decoding with Large Language Models

Sheng Feng; Heyang Liu; Yu Wang; Yanfeng Wang

Towards an End-to-End Framework for Invasive Brain Signal Decoding with Large Language Models

Sheng Feng, Heyang Liu, Yu Wang, Yanfeng Wang

TL;DR

This work showcases the efficacy of combining LLMs with E2E decoding for enhancing speech neuroprosthesis and sets a new direction for future research in BCI applications, underscoring the impact of LLMs in decoding complex neural signals for communication restoration.

Abstract

In this paper, we introduce a groundbreaking end-to-end (E2E) framework for decoding invasive brain signals, marking a significant advancement in the field of speech neuroprosthesis. Our methodology leverages the comprehensive reasoning abilities of large language models (LLMs) to facilitate direct decoding. By fully integrating LLMs, we achieve results comparable to the state-of-the-art cascade models. Our findings underscore the immense potential of E2E frameworks in speech neuroprosthesis, particularly as the technology behind brain-computer interfaces (BCIs) and the availability of relevant datasets continue to evolve. This work not only showcases the efficacy of combining LLMs with E2E decoding for enhancing speech neuroprosthesis but also sets a new direction for future research in BCI applications, underscoring the impact of LLMs in decoding complex neural signals for communication restoration. Code will be made available at https://github.com/FsFrancis15/BrainLLM.

Towards an End-to-End Framework for Invasive Brain Signal Decoding with Large Language Models

TL;DR

Abstract

Paper Structure (15 sections, 1 equation, 3 figures, 2 tables)

This paper contains 15 sections, 1 equation, 3 figures, 2 tables.

Introduction
Related Works
Speech neuroprosthesis
Automatic speech recognition
Proposed Method
End-to-End brain-to-text framework
Feature extractor
LLM decoder
Multi-stage training
Experiment
Dataset
Data preprocessing
Experimental settings
Result and Discussion
Conclusion

Figures (3)

Figure 1: Diagram of the proposed end-to-end invasive brain signal decoding framework. The multimodal tokens generated by the feature extractor from processed brain signals are used for LLM decoding.
Figure 2: Multi-stage training strategy. In the modality alignment stage (left), the LLM is frozen. In the LLM finetuning stage (right), the feature extractor and most parameters of the LLM are frozen.
Figure 3: WER on test dataset for different feature extractors and LLMs after modality alignment stage (left) and LLM finetuning stage (right). "BPE" and "Phone" refer to the pre-training task of the feature extractor: brain to BPE unit task and brain to phoneme task. "BGRU" means using bidirectional GRU as the structure of the feature extractor.

Towards an End-to-End Framework for Invasive Brain Signal Decoding with Large Language Models

TL;DR

Abstract

Towards an End-to-End Framework for Invasive Brain Signal Decoding with Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (3)