Table of Contents
Fetching ...

Dynamic Neural Communication: Convergence of Computer Vision and Brain-Computer Interface

Ji-Ha Park, Seo-Hyun Lee, Soowon Kim, Seong-Whan Lee

TL;DR

The results demonstrate the potential to rapidly capture and reconstruct lip movements during natural speech attempts from human neural signals, enabling dynamic neural communication through the convergence of computer vision and brain-computer interface.

Abstract

Interpreting human neural signals to decode static speech intentions such as text or images and dynamic speech intentions such as audio or video is showing great potential as an innovative communication tool. Human communication accompanies various features, such as articulatory movements, facial expressions, and internal speech, all of which are reflected in neural signals. However, most studies only generate short or fragmented outputs, while providing informative communication by leveraging various features from neural signals remains challenging. In this study, we introduce a dynamic neural communication method that leverages current computer vision and brain-computer interface technologies. Our approach captures the user's intentions from neural signals and decodes visemes in short time steps to produce dynamic visual outputs. The results demonstrate the potential to rapidly capture and reconstruct lip movements during natural speech attempts from human neural signals, enabling dynamic neural communication through the convergence of computer vision and brain--computer interface.

Dynamic Neural Communication: Convergence of Computer Vision and Brain-Computer Interface

TL;DR

The results demonstrate the potential to rapidly capture and reconstruct lip movements during natural speech attempts from human neural signals, enabling dynamic neural communication through the convergence of computer vision and brain-computer interface.

Abstract

Interpreting human neural signals to decode static speech intentions such as text or images and dynamic speech intentions such as audio or video is showing great potential as an innovative communication tool. Human communication accompanies various features, such as articulatory movements, facial expressions, and internal speech, all of which are reflected in neural signals. However, most studies only generate short or fragmented outputs, while providing informative communication by leveraging various features from neural signals remains challenging. In this study, we introduce a dynamic neural communication method that leverages current computer vision and brain-computer interface technologies. Our approach captures the user's intentions from neural signals and decodes visemes in short time steps to produce dynamic visual outputs. The results demonstrate the potential to rapidly capture and reconstruct lip movements during natural speech attempts from human neural signals, enabling dynamic neural communication through the convergence of computer vision and brain--computer interface.

Paper Structure

This paper contains 10 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Overall architecture of the sentence--based viseme decoding framework from overt speech EEG. The EEG signals from spoken sentences are segmented based on phoneme intervals, and the units are mapped to condensed viseme classes. The embedded features are adjusted to variable time lengths t and used for training. The diffusion--based EEG signal decoding model is trained to effectively capture the viseme information. Finally, the predicted viseme sequences are sequentially reconstructed to form a complete sentence.
  • Figure 2: The predicted labels from EEG segments are arranged into a one--dimensional viseme sequence. The incomplete sequence is reconstructed into a complete sentence using a pre--trained LSTM model. The LSTM model was trained using the original viseme sequences for 50 predefined sentences.