Table of Contents
Fetching ...

Psychophysiology-aided Perceptually Fluent Speech Analysis of Children Who Stutter

Yi Xiao, Harshit Sharma, Victoria Tumanova, Asif Salekin

TL;DR

PASAD addresses the challenge of distinguishing perceptually fluent speech in preschoolers who stutter by integrating real-time psychophysiology into a Hyper-LSTM framework. Using Mel-spectrograms as the primary speech representation, PASAD dynamically generates timestamp-specific network weights from physiological inputs, enabling adaptive analysis of speech-motor-control factors. The approach demonstrates strong classification performance, robust ablation results, and real-time edge deployment potential, while providing interpretable insights via Kernel SHAP into formant and F0-related attributes linked to stuttering mechanisms. This work offers a foundation for just-in-time, data-driven interventions and advances understanding of dynamic speech-motor-control in early stuttering development.

Abstract

This paper presents a novel approach named PASAD that detects changes in perceptually fluent speech acoustics of young children. Particularly, analysis of perceptually fluent speech enables identifying the speech-motor-control factors that are considered as the underlying cause of stuttering disfluencies. Recent studies indicate that the speech production of young children, especially those who stutter, may get adversely affected by situational physiological arousal. A major contribution of this paper is leveraging the speaker's situational physiological responses in real-time to analyze the speech signal effectively. The presented PASAD approach adapts a Hyper-Network structure to extract temporal speech importance information leveraging physiological parameters. Moreover, we collected speech and physiological sensing data from 73 preschool-age children who stutter (CWS) and who do not stutter (CWNS) in different conditions. PASAD's unique architecture enables identifying speech attributes distinct to a CWS's fluent speech and mapping them to the speaker's respective speech-motor-control factors. Extracted knowledge can enhance understanding of children's speech-motor-control and stuttering development. Our comprehensive evaluation shows that PASAD outperforms state-of-the-art multi-modal baseline approaches in different conditions, is expressive and adaptive to the speaker's speech and physiology, generalizable, robust, and is real-time executable.

Psychophysiology-aided Perceptually Fluent Speech Analysis of Children Who Stutter

TL;DR

PASAD addresses the challenge of distinguishing perceptually fluent speech in preschoolers who stutter by integrating real-time psychophysiology into a Hyper-LSTM framework. Using Mel-spectrograms as the primary speech representation, PASAD dynamically generates timestamp-specific network weights from physiological inputs, enabling adaptive analysis of speech-motor-control factors. The approach demonstrates strong classification performance, robust ablation results, and real-time edge deployment potential, while providing interpretable insights via Kernel SHAP into formant and F0-related attributes linked to stuttering mechanisms. This work offers a foundation for just-in-time, data-driven interventions and advances understanding of dynamic speech-motor-control in early stuttering development.

Abstract

This paper presents a novel approach named PASAD that detects changes in perceptually fluent speech acoustics of young children. Particularly, analysis of perceptually fluent speech enables identifying the speech-motor-control factors that are considered as the underlying cause of stuttering disfluencies. Recent studies indicate that the speech production of young children, especially those who stutter, may get adversely affected by situational physiological arousal. A major contribution of this paper is leveraging the speaker's situational physiological responses in real-time to analyze the speech signal effectively. The presented PASAD approach adapts a Hyper-Network structure to extract temporal speech importance information leveraging physiological parameters. Moreover, we collected speech and physiological sensing data from 73 preschool-age children who stutter (CWS) and who do not stutter (CWNS) in different conditions. PASAD's unique architecture enables identifying speech attributes distinct to a CWS's fluent speech and mapping them to the speaker's respective speech-motor-control factors. Extracted knowledge can enhance understanding of children's speech-motor-control and stuttering development. Our comprehensive evaluation shows that PASAD outperforms state-of-the-art multi-modal baseline approaches in different conditions, is expressive and adaptive to the speaker's speech and physiology, generalizable, robust, and is real-time executable.
Paper Structure (28 sections, 5 equations, 5 figures, 6 tables)

This paper contains 28 sections, 5 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Mel-spectrogram for a $5$ sec speaking window
  • Figure 2: PASAD's architecture.
  • Figure 3: The Nonlocal Block. $\otimes$ and $\oplus$ denote matrix multiplication and element-wise sum. "Conv" denote $1\times1$ convolution layers. $C$ is the number of channel, $H$ and $W$ are height and width of input feature.
  • Figure 4: $LSTM_{main}$ Gate weight visualization with respective change-score features and Mel-spectrogram. The time metric is on seconds, and the darkness of the blue color shows gate weight change intensity.
  • Figure 5: Visualization of important spectrogram coordinates for CWS speech samples