Table of Contents
Fetching ...

Maximum Discrepancy Generative Regularization and Non-Negative Matrix Factorization for Single Channel Source Separation

Martin Ludvigsen, Markus Grasmair

TL;DR

This work tackles single-channel source separation under weak supervision by proposing Maximum Discrepancy Generative Regularization (MDGR), a framework that adversarially trains generative models for each source within SCSS. The authors instantiate MDGRF as Maximum Discrepancy NMF (MDNMF), integrating weak supervision, adversarial data, and optional strong supervision to train NMF dictionaries with multiplicative updates. By combining MDNMF with discriminative training (DNMF) into the D+MDNMF family, the approach achieves robust performance with limited labeled data, and introduces a principled way to penalize fitting adversarial examples via an IPM-like objective. Numerical experiments on MNIST and speech enhancement tasks demonstrate that MDNMF and its variants can outperform standard NMF and DNMF, particularly when supervision is scarce, highlighting practical impact for image and audio separation under weak supervision conditions.

Abstract

The idea of adversarial learning of regularization functionals has recently been introduced in the wider context of inverse problems. The intuition behind this method is the realization that it is not only necessary to learn the basic features that make up a class of signals one wants to represent, but also, or even more so, which features to avoid in the representation. In this paper, we will apply this approach to the training of generative models, leading to what we call Maximum Discrepancy Generative Regularization. In particular, we apply this to problem of source separation by means of Non-negative Matrix Factorization (NMF) and present a new method for the adversarial training of NMF bases. We show in numerical experiments, both for image and audio separation, that this leads to a clear improvement of the reconstructed signals, in particular in the case where little or no strong supervision data is available.

Maximum Discrepancy Generative Regularization and Non-Negative Matrix Factorization for Single Channel Source Separation

TL;DR

This work tackles single-channel source separation under weak supervision by proposing Maximum Discrepancy Generative Regularization (MDGR), a framework that adversarially trains generative models for each source within SCSS. The authors instantiate MDGRF as Maximum Discrepancy NMF (MDNMF), integrating weak supervision, adversarial data, and optional strong supervision to train NMF dictionaries with multiplicative updates. By combining MDNMF with discriminative training (DNMF) into the D+MDNMF family, the approach achieves robust performance with limited labeled data, and introduces a principled way to penalize fitting adversarial examples via an IPM-like objective. Numerical experiments on MNIST and speech enhancement tasks demonstrate that MDNMF and its variants can outperform standard NMF and DNMF, particularly when supervision is scarce, highlighting practical impact for image and audio separation under weak supervision conditions.

Abstract

The idea of adversarial learning of regularization functionals has recently been introduced in the wider context of inverse problems. The intuition behind this method is the realization that it is not only necessary to learn the basic features that make up a class of signals one wants to represent, but also, or even more so, which features to avoid in the representation. In this paper, we will apply this approach to the training of generative models, leading to what we call Maximum Discrepancy Generative Regularization. In particular, we apply this to problem of source separation by means of Non-negative Matrix Factorization (NMF) and present a new method for the adversarial training of NMF bases. We show in numerical experiments, both for image and audio separation, that this leads to a clear improvement of the reconstructed signals, in particular in the case where little or no strong supervision data is available.
Paper Structure (23 sections, 3 theorems, 56 equations, 11 figures, 2 tables, 1 algorithm)

This paper contains 23 sections, 3 theorems, 56 equations, 11 figures, 2 tables, 1 algorithm.

Key Result

Theorem 6

Assume that Assumptions ass:1 and ass:2 hold. In addition, assume that the sets $\mathcal{G}_i$ are (sequentially) compact and that the training data have finite second moments, that is, for all $i=1,\ldots,S$ and that Then the problem eq:weightedsum admits a solution.

Figures (11)

  • Figure 1: A general encoder-decoder framework. Here the encoding and decoding is done using NMF with sparsity (see Section \ref{['sec:NMF']} for details). While the encoding and decoding process does not perfectly reconstruct signals, especially when using low complexity models, they are still useful as a prior on signals. They are also robust against noise, as we see that the process yields similar outputs for a signal and a noisy version of that signal.
  • Figure 2: Illustration of the difference between strong supervision and weak supervision. In the strong supervision situation, all paired data is available. In weak supervision, only data from the marginals are available, and these data are unpaired.
  • Figure 3: A figure illustrating the difference between NMF and MDNMF. Here we use $\lambda = [2\cdot10^{-2}, 2\cdot10^{-2}, 3\cdot10^{-2}]$, $\tau_W = 1$ and $\tau_A = \sqrt{0.25}$. A fitted basis with NMF can be sensitive to outliers, and the resulting basis can also represent adversarial data. MDNMF explicitly avoids fitting adversarial data, potentially leading to worse representation of outlier data. This property can be beneficial for downstream tasks like source separation.
  • Figure 4: Results from experiments in a data rich strong supervised setting. The lines show the median PSNR of the reconstructions of the methods applied to the test dataset, along with the standard error. For MDNMF we use $\tau_W = 1$ and $\tau_A = 0.2$. We note that performance tends to improve as the number $d$ of basis vectors increases, and that MDNMF consistently outperforms the other methods.
  • Figure 5: Results from experiments in data rich strong supervised setting with $\tau_W = 1$ and varying $\tau_A$ for MDNMF. The lines show the median PSNR over the test dataset for different parameter values, along with the standard error. We note that $\tau_A = 0$ corresponds with standard NMF. We see that selecting $\tau_A$ too large leads to much worse performance, but there is a relatively large range of parameters that yield better performance than standard NMF. We also find that the discrepancy in performance between NMF and MDNMF becomes larger as $d$ increases.
  • ...and 6 more figures

Theorems & Definitions (11)

  • Remark 2
  • Remark 3
  • Remark 4
  • Theorem 6
  • Remark 7
  • Theorem 8
  • Remark 9
  • Definition 10
  • Lemma 11
  • proof
  • ...and 1 more