Table of Contents
Fetching ...

Equivariance-based self-supervised learning for audio signal recovery from clipped measurements

Victor Sechaud, Laurent Jacques, Patrice Abry, Julián Tachella

TL;DR

It is shown that the performance of the proposed equivariance-based self-supervised declipping strategy compares favorably to fully supervised learning while only requiring clipped measurements alone for training.

Abstract

In numerous inverse problems, state-of-the-art solving strategies involve training neural networks from ground truth and associated measurement datasets that, however, may be expensive or impossible to collect. Recently, self-supervised learning techniques have emerged, with the major advantage of no longer requiring ground truth data. Most theoretical and experimental results on self-supervised learning focus on linear inverse problems. The present work aims to study self-supervised learning for the non-linear inverse problem of recovering audio signals from clipped measurements. An equivariance-based selfsupervised loss is proposed and studied. Performance is assessed on simulated clipped measurements with controlled and varied levels of clipping, and further reported on standard real music signals. We show that the performance of the proposed equivariance-based self-supervised declipping strategy compares favorably to fully supervised learning while only requiring clipped measurements alone for training.

Equivariance-based self-supervised learning for audio signal recovery from clipped measurements

TL;DR

It is shown that the performance of the proposed equivariance-based self-supervised declipping strategy compares favorably to fully supervised learning while only requiring clipped measurements alone for training.

Abstract

In numerous inverse problems, state-of-the-art solving strategies involve training neural networks from ground truth and associated measurement datasets that, however, may be expensive or impossible to collect. Recently, self-supervised learning techniques have emerged, with the major advantage of no longer requiring ground truth data. Most theoretical and experimental results on self-supervised learning focus on linear inverse problems. The present work aims to study self-supervised learning for the non-linear inverse problem of recovering audio signals from clipped measurements. An equivariance-based selfsupervised loss is proposed and studied. Performance is assessed on simulated clipped measurements with controlled and varied levels of clipping, and further reported on standard real music signals. We show that the performance of the proposed equivariance-based self-supervised declipping strategy compares favorably to fully supervised learning while only requiring clipped measurements alone for training.
Paper Structure (18 sections, 20 equations, 6 figures, 3 tables)

This paper contains 18 sections, 20 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Learning to declip signals from measurement data alone. We propose a new self-supervised method that can learn to declip audio signals without ever seeing ground-truth reference signals by exploiting carefully chosen assumptions on the invariance of the reconstructed signal distribution.
  • Figure 2: Comparison of $\mathcal{L}_{\textrm{NMC}}$ and $\mathcal{L}_{\textrm{MC}}$ with $\mu= 1$. On the left: Curve of the $\mathcal{L}_{\textrm{NMC}}$ in one dimension for two examples in both cases saturated and not. On the right: Curve of the new $\mathcal{L}_{\textrm{MC}}$ for the same two examples.
  • Figure 3: A toy example where a network with learnable biases is not suitable for a scale-invariant signal set. From left to right: Top row: a signal, the associated measurement, and reconstruction; Bottom row: the same signal divided by 10 (still contained in $\mathcal{X}$ by assumption) with the associated measurement and reconstruction.
  • Figure 4: Diagram illustrating the used model. A mask is computed from the measurement and both the mask and measurements are fed to the network.
  • Figure 5: Average reconstruction performance as a function of saturated part $v$ and model dimension $d$. The value indicated corresponds to the mean $\textrm{SDR}$ over the test dataset.
  • ...and 1 more figures