Table of Contents
Fetching ...

Relating the Neural Representations of Vocalized, Mimed, and Imagined Speech

Maryam Maghsoudi, Rupesh Chillale, Shihab A. Shamma

Abstract

We investigated the relationship among neural representations of vocalized, mimed, and imagined speech recorded using publicly available stereotactic EEG recordings. Most prior studies have focused on decoding speech responses within each condition separately. Here, instead, we explore how responses across conditions relate by training linear spectrogram reconstruction models for each condition and evaluate their generalization across conditions. We demonstrate that linear decoders trained on one condition generally transfer successfully to others, implying shared speech representations. This commonality was assessed with stimulus-level discriminability by performing a rank-based analysis demonstrating preservation of stimulus-specific structure in both within- and across-conditions. Finally, we compared linear reconstructions to those from a nonlinear neural network. While both exhibited cross-condition transfer, linear models achieve superior stimulus-level discriminability.

Relating the Neural Representations of Vocalized, Mimed, and Imagined Speech

Abstract

We investigated the relationship among neural representations of vocalized, mimed, and imagined speech recorded using publicly available stereotactic EEG recordings. Most prior studies have focused on decoding speech responses within each condition separately. Here, instead, we explore how responses across conditions relate by training linear spectrogram reconstruction models for each condition and evaluate their generalization across conditions. We demonstrate that linear decoders trained on one condition generally transfer successfully to others, implying shared speech representations. This commonality was assessed with stimulus-level discriminability by performing a rank-based analysis demonstrating preservation of stimulus-specific structure in both within- and across-conditions. Finally, we compared linear reconstructions to those from a nonlinear neural network. While both exhibited cross-condition transfer, linear models achieve superior stimulus-level discriminability.
Paper Structure (15 sections, 6 equations, 4 figures)

This paper contains 15 sections, 6 equations, 4 figures.

Figures (4)

  • Figure 1: Linear and nonlinear decoding approaches. Left: Condition-specific linear decoders are trained on Vocalized, Mimed, or Imagined sEEG and evaluated both within and across conditions. Right: Nonlinear neural network architecture used for spectrogram reconstruction in He2025VocalMind.
  • Figure 2: Envelope correlation distributions for linear (left) and neural network (right) decoders. Rows indicate training condition. Colored curves show performance across test conditions; gray distributions represent null models obtained by shuffling. Dashed lines mark mean correlations.
  • Figure 3: Sentence-level discriminability measured by AUC above chance for linear (left) and neural network (right) decoders. Positive values (red) reflect performance above chance.
  • Figure 4: Relationship between mean envelope reconstruction correlation and sentence-level discriminability for linear and neural network decoders. Each point represents a training–test pair (circles: linear; squares: neural network). Dashed and solid lines show the corresponding linear fits.