Table of Contents
Fetching ...

Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer

Peng Wang, Yifan Yang, Zheng Liang, Tian Tan, Shiliang Zhang, Xie Chen

TL;DR

The paper tackles the difficulty of named entity recognition in end-to-end ASR by long-tail surface forms, proposing C-FNT, a factorized neural Transducer augmented with a class-based language model. By scoring named entities through a dedicated @name class and constraining emission to a provided name list, C-FNT preserves standard ASR performance while substantially reducing named-entity errors, as evidenced by up to $7.2\%$–$7.6\%$ relative WER improvements and $27.9\%$–$30.8\%$ relative F1 gains on targeted NER tests. The decoding is carefully engineered with beam search and four status transitions to navigate the name class, and a dynamic beam size mitigates path duplication. Overall, the approach offers a modular, adaptable framework for NER in E2E ASR, enabling easy updates of name lists and domain-specific entities while maintaining strong general recognition performance, and it demonstrates the potential of integrating class-based LMs into E2E models.

Abstract

Despite advancements of end-to-end (E2E) models in speech recognition, named entity recognition (NER) is still challenging but critical for semantic understanding. Previous studies mainly focus on various rule-based or attention-based contextual biasing algorithms. However, their performance might be sensitive to the biasing weight or degraded by excessive attention to the named entity list, along with a risk of false triggering. Inspired by the success of the class-based language model (LM) in NER in conventional hybrid systems and the effective decoupling of acoustic and linguistic information in the factorized neural Transducer (FNT), we propose C-FNT, a novel E2E model that incorporates class-based LMs into FNT. In C-FNT, the LM score of named entities can be associated with the name class instead of its surface form. The experimental results show that our proposed C-FNT significantly reduces error in named entities without hurting performance in general word recognition.

Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer

TL;DR

The paper tackles the difficulty of named entity recognition in end-to-end ASR by long-tail surface forms, proposing C-FNT, a factorized neural Transducer augmented with a class-based language model. By scoring named entities through a dedicated @name class and constraining emission to a provided name list, C-FNT preserves standard ASR performance while substantially reducing named-entity errors, as evidenced by up to relative WER improvements and relative F1 gains on targeted NER tests. The decoding is carefully engineered with beam search and four status transitions to navigate the name class, and a dynamic beam size mitigates path duplication. Overall, the approach offers a modular, adaptable framework for NER in E2E ASR, enabling easy updates of name lists and domain-specific entities while maintaining strong general recognition performance, and it demonstrates the potential of integrating class-based LMs into E2E models.

Abstract

Despite advancements of end-to-end (E2E) models in speech recognition, named entity recognition (NER) is still challenging but critical for semantic understanding. Previous studies mainly focus on various rule-based or attention-based contextual biasing algorithms. However, their performance might be sensitive to the biasing weight or degraded by excessive attention to the named entity list, along with a risk of false triggering. Inspired by the success of the class-based language model (LM) in NER in conventional hybrid systems and the effective decoupling of acoustic and linguistic information in the factorized neural Transducer (FNT), we propose C-FNT, a novel E2E model that incorporates class-based LMs into FNT. In C-FNT, the LM score of named entities can be associated with the name class instead of its surface form. The experimental results show that our proposed C-FNT significantly reduces error in named entities without hurting performance in general word recognition.
Paper Structure (14 sections, 6 equations, 2 figures, 3 tables)

This paper contains 14 sections, 6 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: The illustration of three model structures: (a)standard neural Transducer (NT). (b) factorized neural Transducer (FNT). (c) proposed factorized neural Transducer with class-based LM (C-FNT)
  • Figure 2: An illustration of beam search decoding for C-FNT