Table of Contents
Fetching ...

SSDM: Scalable Speech Dysfluency Modeling

Jiachen Lian, Xuanru Zhou, Zoe Ezzes, Jet Vonk, Brittany Morin, David Baquirin, Zachary Mille, Maria Luisa Gorno Tempini, Gopala Krishna Anumanchipalli

TL;DR

SSDM is proposed, which adopts articulatory gestures as scalable forced alignment; introduces connectionist subsequence aligner (CSA) to achieve dysfluency alignment; introduces a large-scale simulated dysfluency corpus called Libri-Dys; and develops an end-to-end system by leveraging the power of large language models (LLMs).

Abstract

Speech dysfluency modeling is the core module for spoken language learning, and speech therapy. However, there are three challenges. First, current state-of-the-art solutions\cite{lian2023unconstrained-udm, lian-anumanchipalli-2024-towards-hudm} suffer from poor scalability. Second, there is a lack of a large-scale dysfluency corpus. Third, there is not an effective learning framework. In this paper, we propose \textit{SSDM: Scalable Speech Dysfluency Modeling}, which (1) adopts articulatory gestures as scalable forced alignment; (2) introduces connectionist subsequence aligner (CSA) to achieve dysfluency alignment; (3) introduces a large-scale simulated dysfluency corpus called Libri-Dys; and (4) develops an end-to-end system by leveraging the power of large language models (LLMs). We expect SSDM to serve as a standard in the area of dysfluency modeling. Demo is available at \url{https://berkeley-speech-group.github.io/SSDM/}.

SSDM: Scalable Speech Dysfluency Modeling

TL;DR

SSDM is proposed, which adopts articulatory gestures as scalable forced alignment; introduces connectionist subsequence aligner (CSA) to achieve dysfluency alignment; introduces a large-scale simulated dysfluency corpus called Libri-Dys; and develops an end-to-end system by leveraging the power of large language models (LLMs).

Abstract

Speech dysfluency modeling is the core module for spoken language learning, and speech therapy. However, there are three challenges. First, current state-of-the-art solutions\cite{lian2023unconstrained-udm, lian-anumanchipalli-2024-towards-hudm} suffer from poor scalability. Second, there is a lack of a large-scale dysfluency corpus. Third, there is not an effective learning framework. In this paper, we propose \textit{SSDM: Scalable Speech Dysfluency Modeling}, which (1) adopts articulatory gestures as scalable forced alignment; (2) introduces connectionist subsequence aligner (CSA) to achieve dysfluency alignment; (3) introduces a large-scale simulated dysfluency corpus called Libri-Dys; and (4) develops an end-to-end system by leveraging the power of large language models (LLMs). We expect SSDM to serve as a standard in the area of dysfluency modeling. Demo is available at \url{https://berkeley-speech-group.github.io/SSDM/}.
Paper Structure (70 sections, 16 equations, 15 figures, 7 tables, 2 algorithms)

This paper contains 70 sections, 16 equations, 15 figures, 7 tables, 2 algorithms.

Figures (15)

  • Figure 1: SSDM. Comparison to other methods
  • Figure 2: SSDM architecture
  • Figure 3: LSA(LCS) delivers dysfluent alignment that is more semantically aligned.
  • Figure 4: CSA
  • Figure 5: Gestural Dysfluency Visualization
  • ...and 10 more figures