ImKWS: Test-Time Adaptation for Keyword Spotting with Class Imbalance

Hanyu Ding; Yang Xiao; Jiaheng Dong; Ting Dang

ImKWS: Test-Time Adaptation for Keyword Spotting with Class Imbalance

Hanyu Ding, Yang Xiao, Jiaheng Dong, Ting Dang

TL;DR

Experiments on the Google Speech Commands dataset indicate ImKWS achieves reliable adaptation in realistic imbalanced scenarios, and splits the entropy process into a reward branch and a penalty branch with separate update strengths to ensure stable model updates.

Abstract

Keyword spotting (KWS) identifies words for voice assistants, but environmental noise frequently reduces accuracy. Standard adaptation fixes this issue and strictly requires original or labeled audio. Test time adaptation (TTA) solves this data constraint using only unlabeled test audio. However, current methods fail to handle the severe imbalance between rare keywords and frequent background sounds. Consequently, standard entropy minimization (EM) becomes overconfident and heavily biased toward the frequent background class. To overcome this problem, we propose a TTA method named ImKWS. Our approach splits the entropy process into a reward branch and a penalty branch with separate update strengths. Furthermore, we enforce consistency across multiple audio transformations to ensure stable model updates. Experiments on the Google Speech Commands dataset indicate ImKWS achieves reliable adaptation in realistic imbalanced scenarios. The code is available on GitHub.

ImKWS: Test-Time Adaptation for Keyword Spotting with Class Imbalance

TL;DR

Abstract

Paper Structure (15 sections, 9 equations, 3 figures, 3 tables)

This paper contains 15 sections, 9 equations, 3 figures, 3 tables.

Introduction
Method
Problem Formulation
Proposed ImKWS Method
Decoupled Entropy Minimization
Multi-view Consistency Loss
Overall: Two-Stage Sample Selection and Objective
Experimental Setup
Results
Effect of Adaptation Under Noise with Different SNRs
Adaptive Robustness of Diverse Imbalance Ratios
Ablation Study
Adaptive Robustness and Gradient Stability
Conclusion
Generative AI Use Disclosure

Figures (3)

Figure 1: Overview of the Proposed ImKWS Method, including Two-stage Sample Selection, Decoupled Entropy Minimization and Multi-view Consistency Loss.
Figure 2: Keyword F1 and Non-keyword F1 on MS-SNSD nvironmental noise at -10 dB.
Figure 3: Gradient Norm without and with consistency on MS-SNSD nvironmental noise (1:4 & 1:8) at -10 dB.

ImKWS: Test-Time Adaptation for Keyword Spotting with Class Imbalance

TL;DR

Abstract

ImKWS: Test-Time Adaptation for Keyword Spotting with Class Imbalance

Authors

TL;DR

Abstract

Table of Contents

Figures (3)