Table of Contents
Fetching ...

EMALG: An Enhanced Mandarin Lombard Grid Corpus with Meaningful Sentences

Baifeng Li, Qingmu Liu, Yuhong Yang, Hongyang Chen, Weiping Tu, Song Lin

TL;DR

This work introduces EMALG, an enhanced Mandarin Lombard grid corpus with meaningful sentences to overcome the limitations of MALG and enable robust Lombard-effect analysis in Mandarin. The authors collect data from 34 native Mandarin speakers producing 10,200 meaningful utterances across three noise levels, extract phonetic and acoustic parameters, and compare results with the English Lombard Grid. They show that meaningful sentences elicit stronger Lombard effects, with notable gender differences (females showing larger adjustments) and Mandarin-specific patterns in vowel and consonant adjustments. The findings corroborate prior English–Mandarin comparisons while emphasizing language-specific adaptations, with potential benefits for speech recognition, enhancement, and voice-conversion systems in noisy environments.

Abstract

This study investigates the Lombard effect, where individuals adapt their speech in noisy environments. We introduce an enhanced Mandarin Lombard grid (EMALG) corpus with meaningful sentences , enhancing the Mandarin Lombard grid (MALG) corpus. EMALG features 34 speakers and improves recording setups, addressing challenges faced by MALG with nonsense sentences. Our findings reveal that in Mandarin, meaningful sentences are more effective in enhancing the Lombard effect. Additionally, we uncover that female exhibit a more pronounced Lombard effect than male when uttering meaningful sentences. Moreover, our results reaffirm the consistency in the Lombard effect comparison between English and Mandarin found in previous research.

EMALG: An Enhanced Mandarin Lombard Grid Corpus with Meaningful Sentences

TL;DR

This work introduces EMALG, an enhanced Mandarin Lombard grid corpus with meaningful sentences to overcome the limitations of MALG and enable robust Lombard-effect analysis in Mandarin. The authors collect data from 34 native Mandarin speakers producing 10,200 meaningful utterances across three noise levels, extract phonetic and acoustic parameters, and compare results with the English Lombard Grid. They show that meaningful sentences elicit stronger Lombard effects, with notable gender differences (females showing larger adjustments) and Mandarin-specific patterns in vowel and consonant adjustments. The findings corroborate prior English–Mandarin comparisons while emphasizing language-specific adaptations, with potential benefits for speech recognition, enhancement, and voice-conversion systems in noisy environments.

Abstract

This study investigates the Lombard effect, where individuals adapt their speech in noisy environments. We introduce an enhanced Mandarin Lombard grid (EMALG) corpus with meaningful sentences , enhancing the Mandarin Lombard grid (MALG) corpus. EMALG features 34 speakers and improves recording setups, addressing challenges faced by MALG with nonsense sentences. Our findings reveal that in Mandarin, meaningful sentences are more effective in enhancing the Lombard effect. Additionally, we uncover that female exhibit a more pronounced Lombard effect than male when uttering meaningful sentences. Moreover, our results reaffirm the consistency in the Lombard effect comparison between English and Mandarin found in previous research.
Paper Structure (13 sections, 3 figures, 1 table)

This paper contains 13 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: Experimental setup of EMALG with meaningful sentences, the improvement is highlighted
  • Figure 2: Phonetic and acoustic parameters across talker: female speakers: left; male speakers: right
  • Figure 3: Phonetic and acoustic modifications across three types of background noise levels (30/40 dBA, 55 dBA, 80 dBA) using the MALGyang2022mandarin, EMALG corpora, and English Lombard Gridalghamdi2018corpus. Speech styles are categorized as Plain (P) at 30 dBA for MALG and English Lombard Grid, and 40 dBA for EMALG; Lombard 1 (L1) at 55 dBA; and Lombard 2 (L2) at 80 dBA. For EMALG, loudness comparisons are standardized by reducing the audio energy by 20 dBA. Statistical significance was determined using a paired t-test, with the results showing p $<$ 0.001.