Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding

Jiliang Hu; Zuchao Li; Mengjia Shen; Haojun Ai; Sheng Li; Jun Zhang

Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding

Jiliang Hu, Zuchao Li, Mengjia Shen, Haojun Ai, Sheng Li, Jun Zhang

TL;DR

This paper proposes a joint speech recognition and structure learning framework (JSRSL), an end-to-end SLU model based on span, which can accurately transcribe speech and extract structured content simultaneously, and achieves state-of-the-art performance on the two datasets.

Abstract

Spoken language understanding (SLU) is a structure prediction task in the field of speech. Recently, many works on SLU that treat it as a sequence-to-sequence task have achieved great success. However, This method is not suitable for simultaneous speech recognition and understanding. In this paper, we propose a joint speech recognition and structure learning framework (JSRSL), an end-to-end SLU model based on span, which can accurately transcribe speech and extract structured content simultaneously. We conduct experiments on name entity recognition and intent classification using the Chinese dataset AISHELL-NER and the English dataset SLURP. The results show that our proposed method not only outperforms the traditional sequence-to-sequence method in both transcription and extraction capabilities but also achieves state-of-the-art performance on the two datasets.

Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding

TL;DR

Abstract

Paper Structure (12 sections, 10 equations, 2 figures, 3 tables)

This paper contains 12 sections, 10 equations, 2 figures, 3 tables.

Introduction
Related Work
Method
SLU Representation Learning
Span-based Structure Learning
Joint Optimization
Experiment
Configuration
Main Result
Ablation Study
OOD Analysis
Conclusion

Figures (2)

Figure 1: The structure of our proposed framework, JSRSL.
Figure 2: Out-of-Distribution experiment result. The results on the left are in the Chinese datasets and the results on the right are in the English datasets.

Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding

TL;DR

Abstract

Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding

Authors

TL;DR

Abstract

Table of Contents

Figures (2)