Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation

Shiyao Wang; Shiwan Zhao; Jiaming Zhou; Aobo Kong; Yong Qin

Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation

Shiyao Wang, Shiwan Zhao, Jiaming Zhou, Aobo Kong, Yong Qin

TL;DR

This work introduces a prototype-based approach that markedly improves DSR performance for unseen dysarthric speakers without additional fine-tuning, and incorporates supervised contrastive learning to refine feature extraction.

Abstract

Dysarthric speech recognition (DSR) presents a formidable challenge due to inherent inter-speaker variability, leading to severe performance degradation when applying DSR models to new dysarthric speakers. Traditional speaker adaptation methodologies typically involve fine-tuning models for each speaker, but this strategy is cost-prohibitive and inconvenient for disabled users, requiring substantial data collection. To address this issue, we introduce a prototype-based approach that markedly improves DSR performance for unseen dysarthric speakers without additional fine-tuning. Our method employs a feature extractor trained with HuBERT to produce per-word prototypes that encapsulate the characteristics of previously unseen speakers. These prototypes serve as the basis for classification. Additionally, we incorporate supervised contrastive learning to refine feature extraction. By enhancing representation quality, we further improve DSR performance, enabling effective personalized DSR. We release our code at https://github.com/NKU-HLT/PB-DSR.

Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation

TL;DR

Abstract

Paper Structure (13 sections, 2 equations, 2 figures, 2 tables)

This paper contains 13 sections, 2 equations, 2 figures, 2 tables.

Introduction
Proposed Methods
Prototype-Based DSR
Combining CTC Loss with SCL Loss
Experiments
Dataset
Experimental Settings
The Settings of DSR Models
The Settings of Prototype-Based DSR
Results and Discussions
Visualization
Conclusions
Acknowledgments

Figures (2)

Figure 1: Prototype-based DSR comprises three stages: fine-tuning HuBERT for feature extraction, building per-word prototypes, and prototype-based classification.
Figure 2: Visualizations of speech feature distributions: (a) Seen speaker in V. (b) Unseen speaker in R. (c) Unseen speaker in R+. (d) Enhancing R+ by PB-DSR+. Black dots in (d) represent prototypes, while green and red dots denote samples correctly and incorrectly classified by R+, respectively, with their labels from R+ predictions. A red line connects a red dot to its correct prototype by PB-DSR+. Each label is the word ID in the UASpeech dataset.

Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation

TL;DR

Abstract

Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation

Authors

TL;DR

Abstract

Table of Contents

Figures (2)