PCOV-KWS: Multi-task Learning for Personalized Customizable Open Vocabulary Keyword Spotting

Jianan Pan; Kejie Huang

PCOV-KWS: Multi-task Learning for Personalized Customizable Open Vocabulary Keyword Spotting

Jianan Pan, Kejie Huang

Abstract

As advancements in technologies like Internet of Things (IoT), Automatic Speech Recognition (ASR), Speaker Verification (SV), and Text-to-Speech (TTS) lead to increased usage of intelligent voice assistants, the demand for privacy and personalization has escalated. In this paper, we introduce a multi-task learning framework for personalized, customizable open-vocabulary Keyword Spotting (PCOV-KWS). This framework employs a lightweight network to simultaneously perform Keyword Spotting (KWS) and SV to address personalized KWS requirements. We have integrated a training criterion distinct from softmax-based loss, transforming multi-class classification into multiple binary classifications, which eliminates inter-category competition, while an optimization strategy for multi-task loss weighting is employed during training. We evaluated our PCOV-KWS system in multiple datasets, demonstrating that it outperforms the baselines in evaluation results, while also requiring fewer parameters and lower computational resources.

PCOV-KWS: Multi-task Learning for Personalized Customizable Open Vocabulary Keyword Spotting

Abstract

Paper Structure (18 sections, 10 equations, 3 figures, 4 tables)

This paper contains 18 sections, 10 equations, 3 figures, 4 tables.

Introduction
Proposed Approach
Multi-task Learning Architecture
Large-scale Training Dataset
Audio Encoder
Experiments
Experimental Setups
Evaluation Datasets
Evaluation metric
Performance Analysis on Training Strategy
Ablation Studies
Effectiveness of PCOV-KWS Framework
Effectiveness of TDResNeXt
Impact of CIB
Comparison with Baselines
...and 3 more sections

Figures (3)

Figure 1: Proposed architecture of PCOV-KWS: The architecture comprises an audio encoder, which includes a shared encoder and two linear sub-encoders for KWS and SV respectively and cosine classifiers integrated with SphereFace 2 for metric learning.
Figure 2: Performance Analysis on Training Strategy
Figure 3: Evaluation results according to the number of words in a LibriPhrase evaluation set.

PCOV-KWS: Multi-task Learning for Personalized Customizable Open Vocabulary Keyword Spotting

Abstract

PCOV-KWS: Multi-task Learning for Personalized Customizable Open Vocabulary Keyword Spotting

Authors

Abstract

Table of Contents

Figures (3)