Data Augmentation for Pathological Speech Enhancement

Mingchi Hou; Enno Hermann; Ina Kodrasi

Data Augmentation for Pathological Speech Enhancement

Mingchi Hou, Enno Hermann, Ina Kodrasi

TL;DR

While the results demonstrate that DA improves SE performance for pathological speakers, a performance gap between neurotypical and pathological speech persists, highlighting the need for future research on targeted DA strategies for pathological speech.

Abstract

The performance of state-of-the-art speech enhancement (SE) models considerably degrades for pathological speech due to atypical acoustic characteristics and limited data availability. This paper systematically investigates data augmentation (DA) strategies to improve SE performance for pathological speakers, evaluating both predictive and generative SE models. We examine three DA categories, i.e., transformative, generative, and noise augmentation, assessing their impact with objective SE metrics. Experimental results show that noise augmentation consistently delivers the largest and most robust gains, transformative augmentations provide moderate improvements, while generative augmentation yields limited benefits and can harm performance as the amount of synthetic data increases. Furthermore, we show that the effectiveness of DA varies depending on the SE model, with DA being more beneficial for predictive SE models. While our results demonstrate that DA improves SE performance for pathological speakers, a performance gap between neurotypical and pathological speech persists, highlighting the need for future research on targeted DA strategies for pathological speech.

Data Augmentation for Pathological Speech Enhancement

TL;DR

Abstract

Paper Structure (21 sections, 1 equation, 2 figures, 1 table)

This paper contains 21 sections, 1 equation, 2 figures, 1 table.

Introduction
Problem Formulation & SE Models
Augmentation Strategies
Transformative Augmentations
Pitch shift
Time stretch
SpecMix
Generative Augmentations
YourTTS
XTTS
Noise Augmentation
Experimental Settings
Dataset
Training
Augmentation
...and 6 more sections

Figures (2)

Figure 1: $\Delta$PESQ (left) and $\Delta$fwSSNR (right) for pathological speakers using the CR model with strategies from three DA categories at different augmentation ratios. Within each category, separate plots correspond to individual strategies. For reference, the baseline performance without any DA strategy is also shown.
Figure 2: $\Delta$PESQ (left) and $\Delta$fwSSNR (right) for pathological speakers using the SB model with strategies from three DA categories at different augmentation ratios. Within each category, separate plots correspond to individual strategies. For reference, the baseline performance without any DA strategy is also shown.

Data Augmentation for Pathological Speech Enhancement

TL;DR

Abstract

Data Augmentation for Pathological Speech Enhancement

Authors

TL;DR

Abstract

Table of Contents

Figures (2)