Table of Contents
Fetching ...

Subband Splitting: Simple, Efficient and Effective Technique for Solving Block Permutation Problem in Determined Blind Source Separation

Kazuki Matsumoto, Kohei Yatabe

TL;DR

This paper tackles the block permutation problem in determined blind source separation by introducing subband splitting (SS), which partitions the frequency axis into overlapping subbands and applies existing BSS methods sequentially. The approach preserves permutation alignment across subbands by propagating initialization through overlaps and employs shift-based subband generation controlled by parameters $(\theta_W, \theta_\Delta)$, without modifying the underlying BSS algorithms. When combined with IVA and ILRMA (as SS-IVA and SS-ILRMA), the method significantly improves separation performance and convergence speed, achieving near-oracle results for SS-ILRMA on speech and strong gains on music signals. The findings suggest substantial practical impact for robust, efficient BSS in reverberant environments, with potential for real-time deployment and integration with advanced BSS models.

Abstract

Solving the permutation problem is essential for determined blind source separation (BSS). Existing methods, such as independent vector analysis (IVA) and independent low-rank matrix analysis (ILRMA), tackle the permutation problem by modeling the co-occurrence of the frequency components of source signals. One of the remaining challenges in these methods is the block permutation problem, which may cause severe performance degradation. In this paper, we propose a simple and effective technique for solving the block permutation problem. The proposed technique splits the entire frequency bands into several overlapping subbands and sequentially applies BSS methods (e.g., IVA, ILRMA, or any other method) to each subband. Since the splitting reduces the size of the problem, the BSS methods can effectively work in each subband. Then, the permutations among the subbands are aligned by using the separation result in one subband as the initial values for the other subbands. Additionally, we propose SS-IVA and SS-ILRMA by combining subband splitting (SS) with IVA and ILRMA. Experimental results demonstrated that our technique remarkably improves the separation performance without increasing computational cost. In particular, our SS-ILRMA achieved the separation performance comparable to the oracle method (frequency-domain independent component analysis with the ideal permutation solver). Moreover, SS-ILRMA converged faster than conventional IVA and ILRMA.

Subband Splitting: Simple, Efficient and Effective Technique for Solving Block Permutation Problem in Determined Blind Source Separation

TL;DR

This paper tackles the block permutation problem in determined blind source separation by introducing subband splitting (SS), which partitions the frequency axis into overlapping subbands and applies existing BSS methods sequentially. The approach preserves permutation alignment across subbands by propagating initialization through overlaps and employs shift-based subband generation controlled by parameters , without modifying the underlying BSS algorithms. When combined with IVA and ILRMA (as SS-IVA and SS-ILRMA), the method significantly improves separation performance and convergence speed, achieving near-oracle results for SS-ILRMA on speech and strong gains on music signals. The findings suggest substantial practical impact for robust, efficient BSS in reverberant environments, with potential for real-time deployment and integration with advanced BSS models.

Abstract

Solving the permutation problem is essential for determined blind source separation (BSS). Existing methods, such as independent vector analysis (IVA) and independent low-rank matrix analysis (ILRMA), tackle the permutation problem by modeling the co-occurrence of the frequency components of source signals. One of the remaining challenges in these methods is the block permutation problem, which may cause severe performance degradation. In this paper, we propose a simple and effective technique for solving the block permutation problem. The proposed technique splits the entire frequency bands into several overlapping subbands and sequentially applies BSS methods (e.g., IVA, ILRMA, or any other method) to each subband. Since the splitting reduces the size of the problem, the BSS methods can effectively work in each subband. Then, the permutations among the subbands are aligned by using the separation result in one subband as the initial values for the other subbands. Additionally, we propose SS-IVA and SS-ILRMA by combining subband splitting (SS) with IVA and ILRMA. Experimental results demonstrated that our technique remarkably improves the separation performance without increasing computational cost. In particular, our SS-ILRMA achieved the separation performance comparable to the oracle method (frequency-domain independent component analysis with the ideal permutation solver). Moreover, SS-ILRMA converged faster than conventional IVA and ILRMA.
Paper Structure (19 sections, 26 equations, 10 figures, 2 tables, 1 algorithm)

This paper contains 19 sections, 26 equations, 10 figures, 2 tables, 1 algorithm.

Figures (10)

  • Figure 1: Illustration of the proposed technique named subband splitting. The observed signal $\mathbf{x}$ is split into overlapping subbands $(\mathbf{x}_{\mathcal{F}_i})_{i=1}^I$, and a BSS method (e.g., IVA, ILRMA, or any other method) sequentially separates each subband ${\mathcal{F}_i}$ by using the demixing matrices $\mathbf{W}_{\!\mathcal{F}_i}$. The separation results in the $i$th subband $\mathcal{F}_{i}$, including the auxiliary variable $\boldsymbol{\Theta}_{\!\mathcal{F}_i}$, is used as the initial values in the next subband $\mathcal{F}_{i+1}$, which aligns the permutation among the subbands.
  • Figure 2: Example of separation results and their evaluation metric (SDRi) in a two-channel and two-source situation ($M=N=2$). The observed signal (a) is generated from dev1_male4_src_1.wav and dev1_male4_src_2.wav in SiSEC 2011 datasetaraki2011SignalSeparation2012, by convolving them with room impulse responses recorded in a real room hadadMultichannelAudioDatabase2014. The sources are placed at $-75^\circ$ and $60^\circ$, respectively, and the other experimental conditions are the same as those described in Section \ref{['sec:commonCond']} The green and red colors represent the two different sources, where the ratio of their energy was calculated using the oracle sources. For visibility, the frequency axis is trimmed from 0 to 3 kHz. For the separated signals (b) and (c), the colors of the left bars indicate the dominant source in each frequency band, where the letter G and R corresponds to green and red, respectively. While the conventional IVA in (b) resulted in poor SDRi due to the block permutation problem, our proposed SS-IVA was able to successfully align the permutations across all frequencies, even though the BSS algorithm used in (b) and (c) was the same (i.e., AuxIVAonoStableFastUpdate2011).
  • Figure 3: Frequency-wise SI-SDR of the separation result in Fig. \ref{['fig:perm']} (b), where the frequency bins are decimated by a factor of 16 for visibility. The light gray area represents the performance of IVA, where the block permutation problem arises in frequency bands from 0.5 to 1.5 kHz. The dotted line refers to IVA after the permutations are corrected using IPS. For reference, the blue line shows the result of FDICA + IPS.
  • Figure 4: Subbands generated by Eqs. \ref{['eq:shift']}, \ref{['params']}, and \ref{['eq:initLH']}. The horizontal axis indicates the frequency index $f= 1,\ldots, F$. The bounds $(L_i, H_i)$ are indicated with the rounded edges of the bars, and its corresponding subband $\mathcal{F}_i$ is shown by their filled area. The subband parameter was set to $(\theta_W,\theta_\Delta) = (3,2)$. Note that all the frequency bins are included in $2$$({}=\theta_\Delta)$ subbands.
  • Figure 5: SDRi and permutation consistency of each method for speech signals. The proposed methods are emphasized by (light) green and bold letters. The number of basis for ILRMA $K$ is indicated by $\langle 2 \rangle$ and $\langle 10 \rangle$. Boxes for IVA-based methods contain 224 ($=$ 56 pairs of speech sources $\times$ 4 pairs of source directions) results, while those for ILRMA-based methods have 1120 (${}=224 \times 5$ seeds) results. Large markers show the worst cases. For the proposed SS-IVA and SS-ILRMA, the direction of the shift is indicated by $\Uparrow$ (upward) and $\Downarrow$ (downward).
  • ...and 5 more figures