Robustness of Speech Separation Models for Similar-pitch Speakers

Bunlong Lay; Sebastian Zaczek; Kristina Tesch; Timo Gerkmann

Robustness of Speech Separation Models for Similar-pitch Speakers

Bunlong Lay, Sebastian Zaczek, Kristina Tesch, Timo Gerkmann

TL;DR

This paper investigates the robustness of state-of-the-art Neural Network models in scenarios where the pitch differences between speakers are minimal, and reveals that modern models have substantially reduced the performance gap for matched training and testing conditions.

Abstract

Single-channel speech separation is a crucial task for enhancing speech recognition systems in multi-speaker environments. This paper investigates the robustness of state-of-the-art Neural Network models in scenarios where the pitch differences between speakers are minimal. Building on earlier findings by Ditter and Gerkmann, which identified a significant performance drop for the 2018 Chimera++ under similar-pitch conditions, our study extends the analysis to more recent and sophisticated Neural Network models. Our experiments reveal that modern models have substantially reduced the performance gap for matched training and testing conditions. However, a substantial performance gap persists under mismatched conditions, with models performing well for large pitch differences but showing worse performance if the speakers' pitches are similar. These findings motivate further research into the generalizability of speech separation models to similar-pitch speakers and unseen data.

Robustness of Speech Separation Models for Similar-pitch Speakers

TL;DR

Abstract

Paper Structure (13 sections, 5 equations, 1 figure, 1 table)

This paper contains 13 sections, 5 equations, 1 figure, 1 table.

Introduction
Speech Separation
Problem formulation
Mask-based Neural Networks
Deep Clustering
Research Question
Analysis Framework
Datasets
SOTA single-channel speech separation models
Pitch estimation
Metrics
Results
Conclusion

Figures (1)

Figure 1: Performance gap between mixtures with similar-pitch speakers and mixtures with different-pitch speakers in the matched testing case (black line) and mismatched case (yellow line).

Robustness of Speech Separation Models for Similar-pitch Speakers

TL;DR

Abstract

Robustness of Speech Separation Models for Similar-pitch Speakers

Authors

TL;DR

Abstract

Table of Contents

Figures (1)