Table of Contents
Fetching ...

Temporal structure of the language hierarchy within small cortical patches

Julien Gadonneix, Mingfang Zhang, Jérémy Rapin, Linnea Evanson, Pierre Bourdillon, Jean-Rémi King

Abstract

Speech production requires the rapid coordination of a complex hierarchy of linguistic units, transforming a semantic representation into a precise sequence of articulatory movements. To unravel the neural mechanisms underlying this feat, we leverage recordings from eight 3.2 x 3.2 mm 64-microelectrode arrays implanted in the motor cortex and inferior frontal gyrus of two patients tasked to produce twenty thousand sentences. We show that a hierarchy of linguistic features are robustly encoded in most of these small cortical patches. Contrary to our expectations, instead of a clear macroscopic organization between patches, we observe a multiplexing of phonetic, syllabic and lexical representations within each cortical patch. Critically, this coding scheme dynamically changes over time to allow successive phonemes, syllables and words to be simultaneously represented without interference. Overall, these results, reminiscent of position encoding in transformers, show how small cortical patches organize the unfolding of the speech hierarchy during language production.

Temporal structure of the language hierarchy within small cortical patches

Abstract

Speech production requires the rapid coordination of a complex hierarchy of linguistic units, transforming a semantic representation into a precise sequence of articulatory movements. To unravel the neural mechanisms underlying this feat, we leverage recordings from eight 3.2 x 3.2 mm 64-microelectrode arrays implanted in the motor cortex and inferior frontal gyrus of two patients tasked to produce twenty thousand sentences. We show that a hierarchy of linguistic features are robustly encoded in most of these small cortical patches. Contrary to our expectations, instead of a clear macroscopic organization between patches, we observe a multiplexing of phonetic, syllabic and lexical representations within each cortical patch. Critically, this coding scheme dynamically changes over time to allow successive phonemes, syllables and words to be simultaneously represented without interference. Overall, these results, reminiscent of position encoding in transformers, show how small cortical patches organize the unfolding of the speech hierarchy during language production.

Paper Structure

This paper contains 50 sections, 10 figures.

Figures (10)

  • Figure 1: Experimental design and microscopic mapping of the speech hierarchy. A. Intracortical neural activity was recorded from two participants (T12 and T15) implanted with a total of eight $64$-channel microelectrode arrays in the motor cortex and IFG. Participants performed a sentence production task following a visual cue (see \ref{['Method']}). B. Neural activity and feature hierarchy with an example sentence from T12. Top: Neural activity plot of the binned threshold crossing neural feature for one of the patients (the data is binarized for visualization). Middle: Three levels of representations are analyzed from binned neural activity ($X$): phonetic features ($39$ categories via One-Hot-Encoding), syllabic features (sub-word embeddings via FastText bojanowski2017enriching) and lexical features (word embeddings via Spacy honnibal2020spacy). Bottom: Performance is summarized using Pearson correlation ($R$) between predicted and actual vectors, where predictions come from the TRF encoding analyses. C. Neural activity encoding scores per electrode for phonemes (top), syllables (middle), and words (bottom). Electrodes highlighted in color indicate significant encoding scores ($p \leq 0.05$, one-sided $t$-test across folds), with colors corresponding to the arrays shown in A. Sparse representations of phonemes, syllables, and words are observed in language areas (44, 55b), while dense, highly predictable representations are found in premotor areas (6v, d6v).
  • Figure 2: High-resolution encoding and decoding of the speech hierarchy in the motor cortex and IFG. A. Competing anatomical hypotheses for the speech hierarchy. Left: A hierarchical segregation model, where distinct cortical regions (blue for phonetic, green for syllabic and red for lexical) code for different linguistic levels. Right: The overlapping neural mosaic model, where all representations are encoded in the same localized neural populations. B. Cortical map of all the Utah arrays. C-E. Neural activity in Hz from representative electrodes aligned to the onset of phonemes (C), syllables (D) or words (E), demonstrating sharp tuning to linguistic units at different levels; significance is determined by two-sided Mann-Whitney $U$-tests across splits for $p < 10^{-4}$. F-H. Cortical map of phoneme (F), syllable (G) or word (H) peak decoding score within a $10$ s temporal window around onset, showing the eight Utah arrays. I-K. Time-resolved decoding scores ($R$) for phonemes (H), syllables (I) or words (J) for all eight Utah arrays; significance is indicated by dots at each timestamp and array, based on one-sided $t$-tests across splits for $p < 10^{-11}$ ($10^{-7}$ for syllables). Color coding is consistent with panels F-H. L. Normalized decoding scores from the 6v inf array only for words, syllables and phonemes. M. Comparison of representation duration, showing that word-level information persists significantly longer than phoneme-level information ($t_{Wo} - t_{Ph} = 0.99 \pm 0.15$ s; $p < 0.05$, two-sided Wilcoxon signed-rank test and SEM across electrodes with FDR correction). Each triplet of dots represents a Utah array. Ph: phoneme. Sy: syllable. Wo: word. N. Linear regression of word versus phoneme peak decoding performance across arrays ($p < 0.05$, two-sided Wald test with $t$-distribution; $R^2 = 0.58$).
  • Figure 3: Hierarchical overlapping neural representations. A-C. Overlap of decoding scores in neural representations for successive phonemes (A), syllables (B) or words (C) aligned to the onset of the current event. These plots represent scores (top) and normalized scores (middle) for decoders trained on the $i^{th}$ ($i \in \{-4, -3, ..., 4\}$) phoneme, syllable or word but time-locked on the $0^{th}$. Colored bars (bottom) represent the values above the median. Results are derived from all electrode data and averaged across patients, with error representing the SEM. The example phonemes, syllables and words at the top are used for illustration purposes. D-F. Cortical maps showing the number of phonemes (D), syllables (E) and words (F) significantly decoded simultaneously at $t = 0$ s for each Utah array.
  • Figure 4: Dynamic neural trajectories coordinate the speech hierarchy. A. Schematic representation of static versus dynamic neural codes. In a static code, a feature is represented by a fixed pattern of activity, predicting a square time-generalization matrix; in a dynamic code, the representation evolves over time, predicting a diagonal matrix. B-D. Temporal generalization matrices for phonemes (B), syllables (C) and words (D). Contours enclose scores exceeding the mean by $1.5$ standard deviations. E. Same as B–D, combined in a single plot. F-H. Performance of decoders trained at specific time points ($D_{-1s}$, $D_{0s}$, $D_{1s}$) and tested across time for phonemes (F), syllables (G) and words (H); significance is determined by two-sided Wilcoxon signed-rank tests across splits for $p < 10^{-10}$. I-K. Temporal generalization matrices for phonemes (I), syllables (J) and words (K) for successive speech units ($+1$, $0$, $-1$). Contours enclose scores exceeding the mean by $1.5$ standard deviations. L-N. $3D$ visualization of the neural code trajectories until $1.5$ s after phoneme (L), syllable (M) or word (N) onset for successive speech units ($+1$, $0$, $-1$). The axes represent normalized decoding scores for decoders trained at specific time points ($D_{-1s}$, $D_{0s}$, $D_{1s}$). O. Temporal generalization for each individual Utah array. Contours enclose scores exceeding the mean by $1.5$ standard deviations. P. Cortical map of all the Utah arrays (reproduced from \ref{['fig2']}B for clarity).
  • Figure :
  • ...and 5 more figures