Table of Contents
Fetching ...

Determined Multichannel Blind Source Separation with Clustered Source Model

Jianyu Wang, Shanzheng Guan

TL;DR

A clustered source model based on nonnegative block-term decomposition (NBTD) is introduced that outperforms ILRMA and its extensions in anechoic conditions and surpasses the original ILRMA in simulated reverberant environments.

Abstract

The independent low-rank matrix analysis (ILRMA) method stands out as a prominent technique for multichannel blind audio source separation. It leverages nonnegative matrix factorization (NMF) and nonnegative canonical polyadic decomposition (NCPD) to model source parameters. While it effectively captures the low-rank structure of sources, the NMF model overlooks inter-channel dependencies. On the other hand, NCPD preserves intrinsic structure but lacks interpretable latent factors, making it challenging to incorporate prior information as constraints. To address these limitations, we introduce a clustered source model based on nonnegative block-term decomposition (NBTD). This model defines blocks as outer products of vectors (clusters) and matrices (for spectral structure modeling), offering interpretable latent vectors. Moreover, it enables straightforward integration of orthogonality constraints to ensure independence among source images. Experimental results demonstrate that our proposed method outperforms ILRMA and its extensions in anechoic conditions and surpasses the original ILRMA in simulated reverberant environments.

Determined Multichannel Blind Source Separation with Clustered Source Model

TL;DR

A clustered source model based on nonnegative block-term decomposition (NBTD) is introduced that outperforms ILRMA and its extensions in anechoic conditions and surpasses the original ILRMA in simulated reverberant environments.

Abstract

The independent low-rank matrix analysis (ILRMA) method stands out as a prominent technique for multichannel blind audio source separation. It leverages nonnegative matrix factorization (NMF) and nonnegative canonical polyadic decomposition (NCPD) to model source parameters. While it effectively captures the low-rank structure of sources, the NMF model overlooks inter-channel dependencies. On the other hand, NCPD preserves intrinsic structure but lacks interpretable latent factors, making it challenging to incorporate prior information as constraints. To address these limitations, we introduce a clustered source model based on nonnegative block-term decomposition (NBTD). This model defines blocks as outer products of vectors (clusters) and matrices (for spectral structure modeling), offering interpretable latent vectors. Moreover, it enables straightforward integration of orthogonality constraints to ensure independence among source images. Experimental results demonstrate that our proposed method outperforms ILRMA and its extensions in anechoic conditions and surpasses the original ILRMA in simulated reverberant environments.
Paper Structure (8 sections, 23 equations, 5 figures)

This paper contains 8 sections, 23 equations, 5 figures.

Figures (5)

  • Figure 1: Illustration of block term decomposition.
  • Figure 2: SDR and SIR improvements for the studied methods.
  • Figure 3: Average SDR and SIR improvements versus the value of $O$. Conditions: source signals are from two female speakers in WSJ0 and there is no reverberation.
  • Figure 4: Average SDR and SIR improvements versus different number of bases. Conditions: source signals are from two female speakers in WSJ0 and there is no reverberation.
  • Figure 5: Average SDR and SIR improvements versus iteration number. Conditions: source signals are from two female speakers in WSJ0 and there is no reverberation.