Learning to reconstruct from saturated data: audio declipping and high-dynamic range imaging

Victor Sechaud; Laurent Jacques; Patrice Abry; Julián Tachella

Learning to reconstruct from saturated data: audio declipping and high-dynamic range imaging

Victor Sechaud, Laurent Jacques, Patrice Abry, Julián Tachella

TL;DR

This work extends self-supervised learning to the non-linear problem of recovering audio and images from clipped measurements, by assuming that the signal distribution is approximately invariant to changes in amplitude.

Abstract

Learning based methods are now ubiquitous for solving inverse problems, but their deployment in real-world applications is often hindered by the lack of ground truth references for training. Recent self-supervised learning strategies offer a promising alternative, avoiding the need for ground truth. However, most existing methods are limited to linear inverse problems. This work extends self-supervised learning to the non-linear problem of recovering audio and images from clipped measurements, by assuming that the signal distribution is approximately invariant to changes in amplitude. We provide sufficient conditions for learning to reconstruct from saturated signals alone and a self-supervised loss that can be used to train reconstruction networks. Experiments on both audio and image data show that the proposed approach is almost as effective as fully supervised approaches, despite relying solely on clipped measurements for training.

Learning to reconstruct from saturated data: audio declipping and high-dynamic range imaging

TL;DR

Abstract

Paper Structure (29 sections, 8 theorems, 71 equations, 10 figures, 5 tables)

This paper contains 29 sections, 8 theorems, 71 equations, 10 figures, 5 tables.

Introduction
Related work
Self-supervised learning for inverse problems
Audio declipping
High Dynamic Range images
Signal recovery guarantees for saturated measurements
Analysis for model identification and signal recovery
Problem formulation
Model identification.
Inclusion $\mathcal{X} \subset \hat{\mathcal{X}} :$
Inclusion $\hat{\mathcal{X}} \subset \mathcal{X} :$
Signal recovery
Self-supervised learning approach
Loss functions
Network architecture
...and 14 more sections

Key Result

Proposition 1

Let $G$ be a group, if we cannot identify $\mathcal{X}$ from $\mathcal{Y}= \eta(\mathcal{X})$ and if for all $g\in G$, the transformations $\boldsymbol{T}_g$ commute with $\eta(\cdot)$, then we cannot identify $\mathcal{X}$ from $\mathcal{Y}_g = \eta(\boldsymbol{T}_g\mathcal{X})$ for all $g\in G$.

Figures (10)

Figure 1: Example in 3 dimensions illustrating \ref{['Grand theoreme']}. The black set presents a particularly challenging scenario in which all saturated signals are projected to the same point. In contrast, with high probability on $\boldsymbol{A}$, the colored set enables recovery of more signals with moderate norms, as points in $\mathcal{X}$ beyond a certain radius— which is connected to the signal norm when $\boldsymbol{A}$ is Gaussian—are all projected onto a single corner, showing the non-injectivity beyond this radius.
Figure 2: $\mathcal{L}_\textrm{NMC}$ compared to $\mathcal{L}_\textrm{MC}$. On the left: Curve of the $\mathcal{L}_\textrm{NMC}$ in one dimension for two examples in both cases saturated and not. On the right: Curve of the new $\mathcal{L}_\textrm{MC}$ for the same two examples.
Figure 3: An example where the reconstruction is learned through the bias (third column) which prevents the reconstruction for low amplitude measurements. We thus observe that removing the bias makes the network naturally homogeneous and improves the reconstruction. On the first row, we consider a signal $\boldsymbol{x}$, on the second row $\frac{\boldsymbol{x}}{10}$.
Figure 4: An image with its associated blending mask. The mask is white where a channel is saturated and black when none is saturated.
Figure 5: Example where two different signals $\boldsymbol{x}_1, \boldsymbol{x}_2$ have the same measurement $\eta(\boldsymbol{x}_1), \eta(\boldsymbol{x}_2)$ and so the network $f_{\boldsymbol{\theta} }$ fails to recover them both (the second and fourth columns). Adding randomness implies that the two measurements are not equal anymore, and so a network $f^{'}_{\boldsymbol{\theta} }$ can reconstruct the original signal (the third and fifth columns).
...and 5 more figures

Theorems & Definitions (16)

Example 1
Proposition 1
proof
Definition 1: Scale invariance
Proposition 2
proof
Remark 1
Definition 2
Theorem 1
Lemma 1
...and 6 more

Learning to reconstruct from saturated data: audio declipping and high-dynamic range imaging

TL;DR

Abstract

Learning to reconstruct from saturated data: audio declipping and high-dynamic range imaging

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (16)