Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer

Peng Wang; Yifan Yang; Zheng Liang; Tian Tan; Shiliang Zhang; Xie Chen

Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer

Peng Wang, Yifan Yang, Zheng Liang, Tian Tan, Shiliang Zhang, Xie Chen

TL;DR

The paper tackles the difficulty of named entity recognition in end-to-end ASR by long-tail surface forms, proposing C-FNT, a factorized neural Transducer augmented with a class-based language model. By scoring named entities through a dedicated @name class and constraining emission to a provided name list, C-FNT preserves standard ASR performance while substantially reducing named-entity errors, as evidenced by up to $7.2\%$–$7.6\%$ relative WER improvements and $27.9\%$–$30.8\%$ relative F1 gains on targeted NER tests. The decoding is carefully engineered with beam search and four status transitions to navigate the name class, and a dynamic beam size mitigates path duplication. Overall, the approach offers a modular, adaptable framework for NER in E2E ASR, enabling easy updates of name lists and domain-specific entities while maintaining strong general recognition performance, and it demonstrates the potential of integrating class-based LMs into E2E models.

Abstract

Despite advancements of end-to-end (E2E) models in speech recognition, named entity recognition (NER) is still challenging but critical for semantic understanding. Previous studies mainly focus on various rule-based or attention-based contextual biasing algorithms. However, their performance might be sensitive to the biasing weight or degraded by excessive attention to the named entity list, along with a risk of false triggering. Inspired by the success of the class-based language model (LM) in NER in conventional hybrid systems and the effective decoupling of acoustic and linguistic information in the factorized neural Transducer (FNT), we propose C-FNT, a novel E2E model that incorporates class-based LMs into FNT. In C-FNT, the LM score of named entities can be associated with the name class instead of its surface form. The experimental results show that our proposed C-FNT significantly reduces error in named entities without hurting performance in general word recognition.

Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer

TL;DR

–

relative WER improvements and

–

relative F1 gains on targeted NER tests. The decoding is carefully engineered with beam search and four status transitions to navigate the name class, and a dynamic beam size mitigates path duplication. Overall, the approach offers a modular, adaptable framework for NER in E2E ASR, enabling easy updates of name lists and domain-specific entities while maintaining strong general recognition performance, and it demonstrates the potential of integrating class-based LMs into E2E models.

Abstract

Paper Structure (14 sections, 6 equations, 2 figures, 3 tables)

This paper contains 14 sections, 6 equations, 2 figures, 3 tables.

Introduction
Related Works
Standard neural Transducer
Factorized neural Transducer
Reviewing NER in neural Transducer and hybrid system
Proposed Approach
Model architecture
Beam search decoding of C-FNT
Experiments and Results
Datasets and Evaluation Metrics
Experimental Setup
Results
Conclusion
Acknowledgements

Figures (2)

Figure 1: The illustration of three model structures: (a)standard neural Transducer (NT). (b) factorized neural Transducer (FNT). (c) proposed factorized neural Transducer with class-based LM (C-FNT)
Figure 2: An illustration of beam search decoding for C-FNT

Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer

TL;DR

Abstract

Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer

Authors

TL;DR

Abstract

Table of Contents

Figures (2)