Table of Contents
Fetching ...

An audio-quality-based multi-strategy approach for target speaker extraction in the MISP 2023 Challenge

Runduo Han, Xiaopeng Yan, Weiming Xu, Pengcheng Guo, Jiayao Sun, He Wang, Quan Lu, Ning Jiang, Lei Xie

TL;DR

This approach adopts different extraction strategies based on the audio quality, striking a balance between interference removal and speech preservation, which benifits the back-end automatic speech recognition (ASR) systems.

Abstract

This paper describes our audio-quality-based multi-strategy approach for the audio-visual target speaker extraction (AVTSE) task in the Multi-modal Information based Speech Processing (MISP) 2023 Challenge. Specifically, our approach adopts different extraction strategies based on the audio quality, striking a balance between interference removal and speech preservation, which benifits the back-end automatic speech recognition (ASR) systems. Experiments show that our approach achieves a character error rate (CER) of 24.2% and 33.2% on the Dev and Eval set, respectively, obtaining the second place in the challenge.

An audio-quality-based multi-strategy approach for target speaker extraction in the MISP 2023 Challenge

TL;DR

This approach adopts different extraction strategies based on the audio quality, striking a balance between interference removal and speech preservation, which benifits the back-end automatic speech recognition (ASR) systems.

Abstract

This paper describes our audio-quality-based multi-strategy approach for the audio-visual target speaker extraction (AVTSE) task in the Multi-modal Information based Speech Processing (MISP) 2023 Challenge. Specifically, our approach adopts different extraction strategies based on the audio quality, striking a balance between interference removal and speech preservation, which benifits the back-end automatic speech recognition (ASR) systems. Experiments show that our approach achieves a character error rate (CER) of 24.2% and 33.2% on the Dev and Eval set, respectively, obtaining the second place in the challenge.
Paper Structure (8 sections, 1 equation, 1 figure, 2 tables)

This paper contains 8 sections, 1 equation, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Details of (a) Overview of our audio-quality-based multi-strategy approach; (b) TSE for Medium Quality audio; (c) TSE for Low Quality audio and (d) TSE for High Quality audio.