Table of Contents
Fetching ...

Recovery Guarantees of Unsupervised Neural Networks for Inverse Problems trained with Gradient Descent

Nathan Buskulic, Jalal Fadili, Yvain Quéau

TL;DR

The paper addresses the theoretical guarantees for unsupervised neural networks solving inverse problems when trained with gradient descent. By extending Kurdyka-Łojasiewicz-based recovery guarantees from gradient flow to discrete gradient-descent dynamics with a suitable learning rate $\gamma$, it shows convergence to zero loss and provable recovery under a restricted injectivity condition, with discretization effects captured by a fixed constant. It further derives probabilistic overparametrization bounds for a two-layer DIP to achieve these guarantees with high probability, and provides numerical validation on synthetic and image-like data illustrating the trade-offs between conditioning, network width, and early stopping. Overall, the work provides a rigorous bridge between continuous-time guarantees and practical, discrete optimization for unsupervised inverse problems, guiding architecture choices and step-size regimes for reliable reconstructions.

Abstract

Advanced machine learning methods, and more prominently neural networks, have become standard to solve inverse problems over the last years. However, the theoretical recovery guarantees of such methods are still scarce and difficult to achieve. Only recently did unsupervised methods such as Deep Image Prior (DIP) get equipped with convergence and recovery guarantees for generic loss functions when trained through gradient flow with an appropriate initialization. In this paper, we extend these results by proving that these guarantees hold true when using gradient descent with an appropriately chosen step-size/learning rate. We also show that the discretization only affects the overparametrization bound for a two-layer DIP network by a constant and thus that the different guarantees found for the gradient flow will hold for gradient descent.

Recovery Guarantees of Unsupervised Neural Networks for Inverse Problems trained with Gradient Descent

TL;DR

The paper addresses the theoretical guarantees for unsupervised neural networks solving inverse problems when trained with gradient descent. By extending Kurdyka-Łojasiewicz-based recovery guarantees from gradient flow to discrete gradient-descent dynamics with a suitable learning rate , it shows convergence to zero loss and provable recovery under a restricted injectivity condition, with discretization effects captured by a fixed constant. It further derives probabilistic overparametrization bounds for a two-layer DIP to achieve these guarantees with high probability, and provides numerical validation on synthetic and image-like data illustrating the trade-offs between conditioning, network width, and early stopping. Overall, the work provides a rigorous bridge between continuous-time guarantees and practical, discrete optimization for unsupervised inverse problems, guiding architecture choices and step-size regimes for reliable reconstructions.

Abstract

Advanced machine learning methods, and more prominently neural networks, have become standard to solve inverse problems over the last years. However, the theoretical recovery guarantees of such methods are still scarce and difficult to achieve. Only recently did unsupervised methods such as Deep Image Prior (DIP) get equipped with convergence and recovery guarantees for generic loss functions when trained through gradient flow with an appropriate initialization. In this paper, we extend these results by proving that these guarantees hold true when using gradient descent with an appropriately chosen step-size/learning rate. We also show that the discretization only affects the overparametrization bound for a two-layer DIP network by a constant and thus that the different guarantees found for the gradient flow will hold for gradient descent.
Paper Structure (14 sections, 5 theorems, 30 equations, 4 figures)

This paper contains 14 sections, 5 theorems, 30 equations, 4 figures.

Key Result

Theorem 3.1

Consider a network $\mathbf{g}(\mathbf{u},\cdot)$, a forward operator $\mathbf{A}$ and a loss $\mathcal{L}_\mathbf{y}$ such that our assumptions hold. Let $(\pmb{\theta}_{\tau})_{\tau\in\mathbb{N}}$ be the sequence generated by eq:grad_descent. There exists a constant $L > 0$ such that if $\gamma \i where R' and R obey with $\nu_1=\frac{1+\gamma L}{1-\gamma L / 2} \in ]1,4]$, then the following h

Figures (4)

  • Figure 1: Level of overparametrization needed for \ref{['eq:bndR2']} to hold compared to the one required to converge in practice.
  • Figure 2: Probability over 50 runs for a network to converge to an optimal solution for various $n$ and $\gamma$.
  • Figure 3: Deep Inverse Prior applied to image reconstruction.
  • Figure 4: Evolution of the reconstruction error with a well-conditioned operator and high amount of noise.

Theorems & Definitions (12)

  • Definition 2.1
  • Definition 2.2: KL inequality
  • Theorem 3.1
  • Corollary 3.2
  • Lemma A.1
  • proof
  • Lemma A.2
  • proof
  • Lemma A.3
  • proof
  • ...and 2 more