No Word Left Behind: Mitigating Prefix Bias in Open-Vocabulary Keyword Spotting

Yi Liu; Chuan-Che Jeff Huang; Xiao Quan

No Word Left Behind: Mitigating Prefix Bias in Open-Vocabulary Keyword Spotting

Yi Liu, Chuan-Che Jeff Huang, Xiao Quan

TL;DR

This work tackles prefix bias in open-vocabulary keyword spotting by introducing the Partial Overlap Benchmark (POB) to stress-test prefix-sharing cases and the Equal-weighting Position Scoring (EPS) module to attenuate position-biased scoring. EPS reduces early-position emphasis, improving robustness to partially overlapping enrollments, while POB provides a realistic evaluation regime and data for cross-domain generalization. Empirical results show EPS substantially lowers EER on POB benchmarks and raises accuracy on longer-prefix sets, with POB augmentation offering further gains at the cost of some standard benchmark performance. The proposed combination of EPS and POB achieves strong cross-domain performance, though it reveals trade-offs for short commands, motivating future work on data balance and more nuanced weighting strategies for diverse phrase lengths.

Abstract

Open-vocabulary keyword spotting (OV-KWS) enables personalized device control via arbitrary voice commands. Recently, researchers have explored using audio-text joint embeddings, allowing users to enroll phrases with text, and proposed techniques to disambiguate similar utterances. We find that existing OV-KWS solutions often overly bias the beginning phonemes of an enrollment, causing false triggers when negative enrollment-query-pairs share a prefix (``turn the volume up'' vs. ``turn the volume down''). We trace this to two factors: training data bias and position-biased cross-modal scoring. To address these limitations, we introduce the Partial Overlap Benchmark (POB) with two datasets, POB-Spark and POB-LibriPhrase (POB-LP), containing mismatched audio-text pairs with shared prefixes, and propose Equal-weighting Position Scoring (EPS), a lightweight decision layer. Using EPS alone reduces EER on POB-Spark from 64.4\% to 29.3\% and improves POB-LP accuracy from 87.6\% to 96.8\%, while maintaining performance on LibriPhrase and Google Speech Commands (GSC). With POB data added in training, our work achieves the best POB benchmark results while incurring the least amount of degradation on prior metrics among baselines. This degradation is most pronounced in GSC, which contains only one-word commands. We surface mitigating this trade-off as future work.

No Word Left Behind: Mitigating Prefix Bias in Open-Vocabulary Keyword Spotting

TL;DR

Abstract

Paper Structure (16 sections, 4 equations, 7 figures, 1 table)

This paper contains 16 sections, 4 equations, 7 figures, 1 table.

Introduction
Partial Overlap Benchmark
Partial Overlap
First-different Phoneme Index
Design of Partial Overlap Benchmark
Prefix Bias and EPS Module
Prefix Bias
Prefix Bias Can be Found in Prior Work
Equal-weighting Position Scoring (EPS) Module
Experiments
Results
Partial overlap is a major failure mode in baseline models
EPS reduces prefix bias with minimal in-domain change
POB augmentation improves POB performance but degrades prior metrics
Best cross-domain balance comes from combining EPS and POB
...and 1 more sections

Figures (7)

Figure 1: General OV-KWS system with optional training-only heads.
Figure 2: Partial overlap
Figure 3: Alignment
Figure 4: Scoring & decision
Figure 6: Distribution of first-different phoneme index. LibriPhrase diverges almost immediately, whereas POB-LP and POB-Spark contain longer common prefixes before mismatch occurs, enabling more diverse overlapping cases for evaluation.
...and 2 more figures

Theorems & Definitions (2)

Definition 2.1: Partial Overlap
Definition 3.1: Prefix Bias

No Word Left Behind: Mitigating Prefix Bias in Open-Vocabulary Keyword Spotting

TL;DR

Abstract

No Word Left Behind: Mitigating Prefix Bias in Open-Vocabulary Keyword Spotting

Authors

TL;DR

Abstract

Table of Contents

Figures (7)

Theorems & Definitions (2)