ProKWS: Personalized Keyword Spotting via Collaborative Learning of Phonemes and Prosody

Jianan Pan; Yuanming Zhang; Kejie Huang

ProKWS: Personalized Keyword Spotting via Collaborative Learning of Phonemes and Prosody

Jianan Pan, Yuanming Zhang, Kejie Huang

Abstract

Current keyword spotting systems primarily use phoneme-level matching to distinguish confusable words but ignore user-specific pronunciation traits like prosody (intonation, stress, rhythm). This paper presents ProKWS, a novel framework integrating fine-grained phoneme learning with personalized prosody modeling. We design a dual-stream encoder where one stream derives robust phonemic representations through contrastive learning, while the other extracts speaker-specific prosodic patterns. A collaborative fusion module dynamically combines phonemic and prosodic information, enhancing adaptability across acoustic environments. Experiments show ProKWS delivers highly competitive performance, comparable to state-of-the-art models on standard benchmarks and demonstrates strong robustness for personalized keywords with tone and intent variations.

ProKWS: Personalized Keyword Spotting via Collaborative Learning of Phonemes and Prosody

Abstract

Paper Structure (15 sections, 8 equations, 3 figures, 4 tables)

This paper contains 15 sections, 8 equations, 3 figures, 4 tables.

Introduction
Proposed method
Dual-Stream Encoder
Collabrative Fusion Module
Training Criterion
Experiments Configuration
Experimental Setups
Evaluation Datasets and Metrics
Implementation details
Results and Anlysis
Comparative Evaluation of ProKWS
Visual Analysis
Ablation Studies of ProKWS
Conclusion
Acknowledgements

Figures (3)

Figure 1: Overall architecture of the proposed ProKWS.
Figure 2: t-SNE visualization of prosodic signatures across different accents and intents.
Figure 3: Score variation analysis for continuous intent change. The x-axis represents the interpolation coefficient $\alpha$ between imperative and interrogative prosody, and the y-axis represents the resulting score $s(\alpha)$.

ProKWS: Personalized Keyword Spotting via Collaborative Learning of Phonemes and Prosody

Abstract

ProKWS: Personalized Keyword Spotting via Collaborative Learning of Phonemes and Prosody

Authors

Abstract

Table of Contents

Figures (3)