Table of Contents
Fetching ...

Towards Homogeneous Lexical Tone Decoding from Heterogeneous Intracranial Recordings

Di Wu, Siyuan Li, Chen Feng, Lu Cao, Yue Zhang, Jie Yang, Mohamad Sawan

TL;DR

Extensive experiments demonstrate that H2DiLR, as a unified decoding paradigm, significantly outperforms the conventional heterogeneous decoding approach and empirically confirm that H2DiLR effectively captures both homogeneity and heterogeneity during neural representation learning.

Abstract

Recent advancements in brain-computer interfaces (BCIs) have enabled the decoding of lexical tones from intracranial recordings, offering the potential to restore the communication abilities of speech-impaired tonal language speakers. However, data heterogeneity induced by both physiological and instrumental factors poses a significant challenge for unified invasive brain tone decoding. Traditional subject-specific models, which operate under a heterogeneous decoding paradigm, fail to capture generalized neural representations and cannot effectively leverage data across subjects. To address these limitations, we introduce Homogeneity-Heterogeneity Disentangled Learning for neural Representations (H2DiLR), a novel framework that disentangles and learns both the homogeneity and heterogeneity from intracranial recordings across multiple subjects. To evaluate H2DiLR, we collected stereoelectroencephalography (sEEG) data from multiple participants reading Mandarin materials comprising 407 syllables, representing nearly all Mandarin characters. Extensive experiments demonstrate that H2DiLR, as a unified decoding paradigm, significantly outperforms the conventional heterogeneous decoding approach. Furthermore, we empirically confirm that H2DiLR effectively captures both homogeneity and heterogeneity during neural representation learning.

Towards Homogeneous Lexical Tone Decoding from Heterogeneous Intracranial Recordings

TL;DR

Extensive experiments demonstrate that H2DiLR, as a unified decoding paradigm, significantly outperforms the conventional heterogeneous decoding approach and empirically confirm that H2DiLR effectively captures both homogeneity and heterogeneity during neural representation learning.

Abstract

Recent advancements in brain-computer interfaces (BCIs) have enabled the decoding of lexical tones from intracranial recordings, offering the potential to restore the communication abilities of speech-impaired tonal language speakers. However, data heterogeneity induced by both physiological and instrumental factors poses a significant challenge for unified invasive brain tone decoding. Traditional subject-specific models, which operate under a heterogeneous decoding paradigm, fail to capture generalized neural representations and cannot effectively leverage data across subjects. To address these limitations, we introduce Homogeneity-Heterogeneity Disentangled Learning for neural Representations (H2DiLR), a novel framework that disentangles and learns both the homogeneity and heterogeneity from intracranial recordings across multiple subjects. To evaluate H2DiLR, we collected stereoelectroencephalography (sEEG) data from multiple participants reading Mandarin materials comprising 407 syllables, representing nearly all Mandarin characters. Extensive experiments demonstrate that H2DiLR, as a unified decoding paradigm, significantly outperforms the conventional heterogeneous decoding approach. Furthermore, we empirically confirm that H2DiLR effectively captures both homogeneity and heterogeneity during neural representation learning.

Paper Structure

This paper contains 33 sections, 8 equations, 6 figures, 13 tables.

Figures (6)

  • Figure 1: Illustration of H2DiLR for unified lexical tone decoding with sEEG from multiple participants. In the homo-heterogeneity disentanglement (H2D) stage, the continuous latent representations from the encoders are disentangled into H2D representations, which are constructed by discretized code embeddings in a shared codebook (homogeneous tone articulation neural codes) and private codebooks (heterogeneous personalized neural codes). The learned H2D representations are utilized for tone decoding in the second stage.
  • Figure 2: Overview of the proposed H2DiLR learning paradigm compared to the heterogeneous learning paradigm. The VQ encoders, decoders, a shared codebook, and private codebooks are learnable and trained in a self-supervised manner during the H2D stage. It is worth noticing that the VQ decoders are discarded after stage one. In the neural decoding stage, all encoders and codebooks are frozen for H2D representation generation, which is used for further decoding with transformers. The red lines and marks denote loss propagation.
  • Figure 3: Illustration of the proposed Homogeneity-heterogeneity Disentanglement (H2D) for $m$ subjects, which contains encoders $\{E_i\}_{i=1}^m$, decoders $\{D_i\}_{i=1}^m$, and a shared codebook $\mathbb{C}^{S}$ and private codebooks $\{\mathbb{C}_{i}^{P}\}_{i=1}^m$ for quantization. For each sample, $\nu L$ tokens are selected from its embedding and discretized with the shared codebook, while the rest of $(1-\nu)L$ tokens are quantized by the corresponding private codebook The red lines and marks denote training loss propagation.
  • Figure 4: The anatomy of four participants mapped onto the standard Montreal Neurological Institute template brain, with directions indicated. All chosen contacts for Participant 1 are situated in the left hemisphere, while those for Participants 2 and 3 are in the right hemisphere. Participant Four's selected contacts are distributed across both hemispheres. The brain structures housing these chosen contacts cover most regions associated with speech, including the Superior Temporal Gyrus (STG), Middle Temporal Gyrus (MTG), ventral Sensorimotor Cortex (vSMC), Inferior Frontal Gyrus (IFG), Precentral Gyrus, and Postcentral Gyrus. Additionally, signals from several subcortical structures are recorded, such as the Thalamus, Hippocampus, Insula, and Amygdala.
  • Figure 5: Comparison of UMAP visualization of neural codes learned by H2D and UPaNT w.r.t different tone classes.
  • ...and 1 more figures