Independent low-rank matrix analysis based on the Sinkhorn divergence source model for blind source separation
Jianyu Wang, Shanzheng Guan, Jingdong Chen, Jacob Benesty
TL;DR
The paper tackles determined blind source separation for audio by relaxing the ILRMA independence assumption across frequency bands. It introduces a Sinkhorn divergence based source model, denoted $D_{\mathrm{S}}^{\mu,\gamma}$, to exploit cross-band spectral dependencies, and replaces the traditional IS divergence accordingly. To manage the resulting parameter growth, it develops an all-one Kronecker product decomposition that enables efficient transport plan computations and updates for the NMF-based source model. Empirical results on WSJ0-like mixtures show SDILRMA outperforms several baselines including ILRMA and MNMF, while the Kronecker-decomposed variant achieves notable reductions in computational load without sacrificing accuracy, indicating practical benefits for real-time or large-scale BSS tasks.
Abstract
The so-called independent low-rank matrix analysis (ILRMA) has demonstrated a great potential for dealing with the problem of determined blind source separation (BSS) for audio and speech signals. This method assumes that the spectra from different frequency bands are independent and the spectral coefficients in any frequency band are Gaussian distributed. The Itakura-Saito divergence is then employed to estimate the source model related parameters. In reality, however, the spectral coefficients from different frequency bands may be dependent, which is not considered in the existing ILRMA algorithm. This paper presents an improved version of ILRMA, which considers the dependency between the spectral coefficients from different frequency bands. The Sinkhorn divergence is then exploited to optimize the source model parameters. As a result of using the cross-band information, the BSS performance is improved. But the number of parameters to be estimated also increases significantly, and so is the computational complexity. To reduce the algorithm complexity, we apply the Kronecker product to decompose the modeling matrix into the product of a number of matrices of much smaller dimensionality. An efficient algorithm is then developed to implement the Sinkhorn divergence based BSS algorithm and the complexity is reduced by an order of magnitude.
