Table of Contents
Fetching ...

Copilot-Assisted Second-Thought Framework for Brain-to-Robot Hand Motion Decoding

Yizhe Li, Shixiao Wang, Jian K. Liu

Abstract

Motor kinematics prediction (MKP) from electroencephalography (EEG) is an important research area for developing movement-related brain-computer interfaces (BCIs). While traditional methods often rely on convolutional neural networks (CNNs) or recurrent neural networks (RNNs), Transformer-based models have shown strong ability in modeling long sequential EEG data. In this study, we propose a CNN-attention hybrid model for decoding hand kinematics from EEG during grasp-and-lift tasks, achieving strong performance in within-subject experiments. We further extend this approach to EEG-EMG multimodal decoding, which yields substantially improved results. Within-subject tests achieve PCC values of 0.9854, 0.9946, and 0.9065 for the X, Y, and Z axes, respectively, computed on the midpoint trajectory between the thumb and index finger, while cross-subject tests result in 0.9643, 0.9795, and 0.5852. The decoded trajectories from both modalities are then used to control a Franka Panda robotic arm in a MuJoCo simulation. To enhance trajectory fidelity, we introduce a copilot framework that filters low-confidence decoded points using a motion-state-aware critic within a finite-state machine. This post-processing step improves the overall within-subject PCC of EEG-only decoding to 0.93 while excluding fewer than 20% of the data points.

Copilot-Assisted Second-Thought Framework for Brain-to-Robot Hand Motion Decoding

Abstract

Motor kinematics prediction (MKP) from electroencephalography (EEG) is an important research area for developing movement-related brain-computer interfaces (BCIs). While traditional methods often rely on convolutional neural networks (CNNs) or recurrent neural networks (RNNs), Transformer-based models have shown strong ability in modeling long sequential EEG data. In this study, we propose a CNN-attention hybrid model for decoding hand kinematics from EEG during grasp-and-lift tasks, achieving strong performance in within-subject experiments. We further extend this approach to EEG-EMG multimodal decoding, which yields substantially improved results. Within-subject tests achieve PCC values of 0.9854, 0.9946, and 0.9065 for the X, Y, and Z axes, respectively, computed on the midpoint trajectory between the thumb and index finger, while cross-subject tests result in 0.9643, 0.9795, and 0.5852. The decoded trajectories from both modalities are then used to control a Franka Panda robotic arm in a MuJoCo simulation. To enhance trajectory fidelity, we introduce a copilot framework that filters low-confidence decoded points using a motion-state-aware critic within a finite-state machine. This post-processing step improves the overall within-subject PCC of EEG-only decoding to 0.93 while excluding fewer than 20% of the data points.

Paper Structure

This paper contains 26 sections, 3 equations, 7 figures.

Figures (7)

  • Figure 1: Structure of the decoding model.
  • Figure 2: Structure of the multi-convolution block (top). Structure of the self-attention block (bottom).
  • Figure 3: Structure of the copilot and decoding point filtering workflow. The grasp-and-lift movement is segmented into five states (SEARCHING, LIFTING, HOLDING, PUTTING, RETURNING), each with different confidence thresholds. UNRELY is a virtual state representing points misclassified as belonging to the current state, which are assigned a higher decoding threshold for filtering.
  • Figure 4: (Left) Within-subject decoding performance for different models (EEGNet, DeepConvNet, Transformer, EEG-TCN, EEG-only, and EMG–EEG fusion). (Right) Within-subject and cross-subject decoding performance for EEG-only and EMG–EEG fusion models.
  • Figure 5: (Left) Model performance with varying input window sizes (50–1000 samples) for participant 4 with a 200 ms delay. Step size for the sliding window was one-fifth of the window length. (Right) Model performance with varying kinematic data delays (50–350 samples) for participant 4 with a fixed input length of 250 samples.
  • ...and 2 more figures