PB-LRDWWS System for the SLT 2024 Low-Resource Dysarthria Wake-Up Word Spotting Challenge
Shiyao Wang, Jiaming Zhou, Shiwan Zhao, Yong Qin
TL;DR
This work tackles wake-up word spotting for low-resource dysarthric speech by introducing PB-LRDWWS, a system that couples a three-stage HuBERT-based dysarthric content feature extractor with prototype-based classification. Prototypes are built from enrollment speech by averaging features, and evaluation relies on cosine similarity between test features and prototypes across $11$ prototypes (10 keywords plus non-keyword). The study systematically compares data augmentation (including TTS-based keyword synthesis and Merge_train), loss functions (CTC, CE, and AddSCL), and classification settings (PB-C, KNN-C, and Model prediction), revealing that cross-entropy loss with PB-C generally yields the best final performance, while CTC benefits from SCL and augmentation but can suffer from class imbalance. The PB-LRDWWS system achieves second place on final Test-B with a score of $0.009801$, demonstrating strong performance in a challenging dysarthric KWS setting and offering a practical approach for personalized wake-up word spotting in assistive devices.
Abstract
For the SLT 2024 Low-Resource Dysarthria Wake-Up Word Spotting (LRDWWS) Challenge, we introduce the PB-LRDWWS system. This system combines a dysarthric speech content feature extractor for prototype construction with a prototype-based classification method. The feature extractor is a fine-tuned HuBERT model obtained through a three-stage fine-tuning process using cross-entropy loss. This fine-tuned HuBERT extracts features from the target dysarthric speaker's enrollment speech to build prototypes. Classification is achieved by calculating the cosine similarity between the HuBERT features of the target dysarthric speaker's evaluation speech and prototypes. Despite its simplicity, our method demonstrates effectiveness through experimental results. Our system achieves second place in the final Test-B of the LRDWWS Challenge.
