Table of Contents
Fetching ...

Informed FastICA: Semi-Blind Minimum Variance Distortionless Beamformer

Zbyněk Koldovský, Jiří Málek, Jaroslav Čmejla, Stephen O'Regan

TL;DR

The paper addresses the problem of extracting a source of interest from multi-microphone mixtures by proposing a semi-blind extension of FastICA/FastIVA that replaces the orthogonality constraint with a minimum-variance distortionless constraint implemented via a weighted covariance. The approach uses side information through a weighted covariance $\widehat{\mathbf C}_\alpha^{[k]}$ to form a semi-blind MVDR solution and derives a second-order update for the mixing vector, linking model-based blind extraction with learning-based methods. It demonstrates that, when $K=1$, the method reduces to the classic FastICA update, while for $K>1$ it behaves as an informed extension resembling MVDR-based IVA. Empirical results on simulations and speaker extraction tasks show faster convergence and improved robustness to short data or weak SOI, with embeddings enabling effective target extraction in reverberant, multi-speaker scenarios.

Abstract

Non-Gaussianity-based Independent Vector Extraction leads to the famous one-unit FastICA/FastIVA algorithm when the likelihood function is optimized using an approximate Newton-Raphson algorithm under the orthogonality constraint. In this paper, we replace the constraint with the analytic form of the minimum variance distortionless beamformer (MVDR), by which a semi-blind variant of FastICA/FastIVA is obtained. The side information here is provided by a weighted covariance matrix replacing the noise covariance matrix, the estimation of which is a frequent goal of neural beamformers. The algorithm thus provides an intuitive connection between model-based blind extraction and learning-based extraction. The algorithm is tested in simulations and speaker ID-guided speaker extraction, showing fast convergence and promising performance.

Informed FastICA: Semi-Blind Minimum Variance Distortionless Beamformer

TL;DR

The paper addresses the problem of extracting a source of interest from multi-microphone mixtures by proposing a semi-blind extension of FastICA/FastIVA that replaces the orthogonality constraint with a minimum-variance distortionless constraint implemented via a weighted covariance. The approach uses side information through a weighted covariance to form a semi-blind MVDR solution and derives a second-order update for the mixing vector, linking model-based blind extraction with learning-based methods. It demonstrates that, when , the method reduces to the classic FastICA update, while for it behaves as an informed extension resembling MVDR-based IVA. Empirical results on simulations and speaker extraction tasks show faster convergence and improved robustness to short data or weak SOI, with embeddings enabling effective target extraction in reverberant, multi-speaker scenarios.

Abstract

Non-Gaussianity-based Independent Vector Extraction leads to the famous one-unit FastICA/FastIVA algorithm when the likelihood function is optimized using an approximate Newton-Raphson algorithm under the orthogonality constraint. In this paper, we replace the constraint with the analytic form of the minimum variance distortionless beamformer (MVDR), by which a semi-blind variant of FastICA/FastIVA is obtained. The side information here is provided by a weighted covariance matrix replacing the noise covariance matrix, the estimation of which is a frequent goal of neural beamformers. The algorithm thus provides an intuitive connection between model-based blind extraction and learning-based extraction. The algorithm is tested in simulations and speaker ID-guided speaker extraction, showing fast convergence and promising performance.
Paper Structure (15 sections, 18 equations, 2 figures)

This paper contains 15 sections, 18 equations, 2 figures.

Figures (2)

  • Figure 1: Success rate and SIR averaged over successful trials of the compared algorithms as functions of $N$ (when SIR$_{\rm ini}=0$ dB) and SIR$_{\rm ini}$ (when $N=200$); each setting was repeated in 1000 trials.
  • Figure 2: MC-WSJ0-2mix: Speaker extraction metrics achieved by the compared algorithms.