Table of Contents
Fetching ...

Condition-Invariant fMRI Decoding of Speech Intelligibility with Deep State Space Model

Ching-Chih Sung, Shuntaro Suzuki, Francis Pingfan Chien, Komei Sugiura, Yu Tsao

TL;DR

This study tackles decoding speech intelligibility from fMRI data across acoustically distinct conditions (noisy vs enhanced speech) to test for condition-invariant neural codes. It introduces a deep state-space model architecture based on a bidirectional S5 variant, tailored to long, high-dimensional fMRI sequences and evaluated ROI-wise. Results show the proposed method outperforms traditional baselines and demonstrates cross-condition transfer, indicating a shared neural representation across conditions, particularly in temporal and frontal-parietal regions. The findings advance understanding of abstract linguistic representations in the brain and point toward brain-informed strategies for improving speech intelligibility in degraded listening environments.

Abstract

Clarifying the neural basis of speech intelligibility is critical for computational neuroscience and digital speech processing. Recent neuroimaging studies have shown that intelligibility modulates cortical activity beyond simple acoustics, primarily in the superior temporal and inferior frontal gyri. However, previous studies have been largely confined to clean speech, leaving it unclear whether the brain employs condition-invariant neural codes across diverse listening environments. To address this gap, we propose a novel architecture built upon a deep state space model for decoding intelligibility from fMRI signals, specifically tailored to their high-dimensional temporal structure. We present the first attempt to decode intelligibility across acoustically distinct conditions, showing our method significantly outperforms classical approaches. Furthermore, region-wise analysis highlights contributions from auditory, frontal, and parietal regions, and cross-condition transfer indicates the presence of condition-invariant neural codes, thereby advancing understanding of abstract linguistic representations in the brain.

Condition-Invariant fMRI Decoding of Speech Intelligibility with Deep State Space Model

TL;DR

This study tackles decoding speech intelligibility from fMRI data across acoustically distinct conditions (noisy vs enhanced speech) to test for condition-invariant neural codes. It introduces a deep state-space model architecture based on a bidirectional S5 variant, tailored to long, high-dimensional fMRI sequences and evaluated ROI-wise. Results show the proposed method outperforms traditional baselines and demonstrates cross-condition transfer, indicating a shared neural representation across conditions, particularly in temporal and frontal-parietal regions. The findings advance understanding of abstract linguistic representations in the brain and point toward brain-informed strategies for improving speech intelligibility in degraded listening environments.

Abstract

Clarifying the neural basis of speech intelligibility is critical for computational neuroscience and digital speech processing. Recent neuroimaging studies have shown that intelligibility modulates cortical activity beyond simple acoustics, primarily in the superior temporal and inferior frontal gyri. However, previous studies have been largely confined to clean speech, leaving it unclear whether the brain employs condition-invariant neural codes across diverse listening environments. To address this gap, we propose a novel architecture built upon a deep state space model for decoding intelligibility from fMRI signals, specifically tailored to their high-dimensional temporal structure. We present the first attempt to decode intelligibility across acoustically distinct conditions, showing our method significantly outperforms classical approaches. Furthermore, region-wise analysis highlights contributions from auditory, frontal, and parietal regions, and cross-condition transfer indicates the presence of condition-invariant neural codes, thereby advancing understanding of abstract linguistic representations in the brain.

Paper Structure

This paper contains 13 sections, 2 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Pipeline for fMRI-based decoding of speech intelligibility across acoustically distinct conditions.
  • Figure 2: Correlation between STOI stoi and perceived speech intelligibility across conditions.
  • Figure 3: Overview of the proposed architecture.
  • Figure 4: Visualization of significant ROIs in speech intelligibility decoding. (a) Whole-brain MVPA results (family-wise error corrected, $p < 0.001$). (b) Top five ROIs with the highest decoding performance in the Noisy condition.