Table of Contents
Fetching ...

NeuGPT: Unified multi-modal Neural GPT

Yiqian Yang, Yiqun Duan, Hyejeong Jo, Qiang Zhang, Renjing Xu, Oiwi Parker Jones, Xuming Hu, Chin-teng Lin, Hui Xiong

TL;DR

NeuGPT is a groundbreaking multi-modal language generation model designed to harmonize the fragmented landscape of neural recording research, architected to process a diverse array of neural recordings and interact with speech and text data.

Abstract

This paper introduces NeuGPT, a groundbreaking multi-modal language generation model designed to harmonize the fragmented landscape of neural recording research. Traditionally, studies in the field have been compartmentalized by signal type, with EEG, MEG, ECoG, SEEG, fMRI, and fNIRS data being analyzed in isolation. Recognizing the untapped potential for cross-pollination and the adaptability of neural signals across varying experimental conditions, we set out to develop a unified model capable of interfacing with multiple modalities. Drawing inspiration from the success of pre-trained large models in NLP, computer vision, and speech processing, NeuGPT is architected to process a diverse array of neural recordings and interact with speech and text data. Our model mainly focus on brain-to-text decoding, improving SOTA from 6.94 to 12.92 on BLEU-1 and 6.93 to 13.06 on ROUGE-1F. It can also simulate brain signals, thereby serving as a novel neural interface. Code is available at \href{https://github.com/NeuSpeech/NeuGPT}{NeuSpeech/NeuGPT (https://github.com/NeuSpeech/NeuGPT) .}

NeuGPT: Unified multi-modal Neural GPT

TL;DR

NeuGPT is a groundbreaking multi-modal language generation model designed to harmonize the fragmented landscape of neural recording research, architected to process a diverse array of neural recordings and interact with speech and text data.

Abstract

This paper introduces NeuGPT, a groundbreaking multi-modal language generation model designed to harmonize the fragmented landscape of neural recording research. Traditionally, studies in the field have been compartmentalized by signal type, with EEG, MEG, ECoG, SEEG, fMRI, and fNIRS data being analyzed in isolation. Recognizing the untapped potential for cross-pollination and the adaptability of neural signals across varying experimental conditions, we set out to develop a unified model capable of interfacing with multiple modalities. Drawing inspiration from the success of pre-trained large models in NLP, computer vision, and speech processing, NeuGPT is architected to process a diverse array of neural recordings and interact with speech and text data. Our model mainly focus on brain-to-text decoding, improving SOTA from 6.94 to 12.92 on BLEU-1 and 6.93 to 13.06 on ROUGE-1F. It can also simulate brain signals, thereby serving as a novel neural interface. Code is available at \href{https://github.com/NeuSpeech/NeuGPT}{NeuSpeech/NeuGPT (https://github.com/NeuSpeech/NeuGPT) .}

Paper Structure

This paper contains 32 sections, 7 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: (Left) NeuGPT overview: NeuGPT is designed to tackle the conversion between neural signals and other modalities such as text and speech. It uses 3 tokenizers for different modalities, converting all into tokens for the LLM to process. (Right) NeuTokenizer architecture: NeuTokenizer consists of an encoder, quantizer, decoder, and discriminator. The tokenizer is trained on a single channel, using the discriminator to improve synthetic results.
  • Figure 2: Temporal recovery of the neural signal by NeuTokenizer. The x-axis is the sample index.
  • Figure 3: Spectral recovery of the neural signal by NeuTokenizer across different scales. gt means ground truth, pd means prediction, mag means magnitude, ang means angle. 1-5 means different scales of STFT.
  • Figure 4: Future NeuGPT