Table of Contents
Fetching ...

Early Stopping of Untrained Convolutional Neural Networks

Tim Jahn, Bangti Jin

TL;DR

It is established that the classical discrepancy principle is an adequate method for early stopping of two-layer untrained convolutional neural networks learned by gradient descent, and furthermore, it yields an approximation with minimax optimal convergence rates.

Abstract

In recent years, new regularization methods based on (deep) neural networks have shown very promising empirical performance for the numerical solution of ill-posed problems, e.g., in medical imaging and imaging science. Due to the nonlinearity of neural networks, these methods often lack satisfactory theoretical justification. In this work, we rigorously discuss the convergence of a successful unsupervised approach that utilizes untrained convolutional neural networks to represent solutions to linear ill-posed problems. Untrained neural networks are particularly appealing for many applications because they do not require paired training data. The regularization property of the approach relies solely on the architecture of the neural network instead. Due to the vast over-parameterization of the employed neural network, suitable early stopping is essential for the success of the method. We establish that the classical discrepancy principle is an adequate method for early stopping of two-layer untrained convolutional neural networks learned by gradient descent, and furthermore, it yields an approximation with minimax optimal convergence rates. Numerical results are also presented to illustrate the theoretical findings.

Early Stopping of Untrained Convolutional Neural Networks

TL;DR

It is established that the classical discrepancy principle is an adequate method for early stopping of two-layer untrained convolutional neural networks learned by gradient descent, and furthermore, it yields an approximation with minimax optimal convergence rates.

Abstract

In recent years, new regularization methods based on (deep) neural networks have shown very promising empirical performance for the numerical solution of ill-posed problems, e.g., in medical imaging and imaging science. Due to the nonlinearity of neural networks, these methods often lack satisfactory theoretical justification. In this work, we rigorously discuss the convergence of a successful unsupervised approach that utilizes untrained convolutional neural networks to represent solutions to linear ill-posed problems. Untrained neural networks are particularly appealing for many applications because they do not require paired training data. The regularization property of the approach relies solely on the architecture of the neural network instead. Due to the vast over-parameterization of the employed neural network, suitable early stopping is essential for the success of the method. We establish that the classical discrepancy principle is an adequate method for early stopping of two-layer untrained convolutional neural networks learned by gradient descent, and furthermore, it yields an approximation with minimax optimal convergence rates. Numerical results are also presented to illustrate the theoretical findings.
Paper Structure (10 sections, 7 theorems, 115 equations, 2 figures, 4 tables)

This paper contains 10 sections, 7 theorems, 115 equations, 2 figures, 4 tables.

Key Result

Theorem 2.1

\newlabelt00 Let $\Sigma(U)$ and $A^\top A\in\mathbb{R}^{n\times n}$ have a common eigenbasis $(w_i)_{i=1}^n$ with corresponding polynomially decaying eigenvalues $\sigma_i^2$ and $\alpha_j^2$: $\frac{\sigma_i^2}{i^{-p}} \in [b_\Sigma,B_\Sigma]$ and $\frac{\alpha_j^2}{j^{-q}} \in [b_A,B_A]$ for so Then, for $0<\delta_\epsilon <\frac{1}{4}$ and the entries of the initial weight matrix $C_0\in\math

Figures (2)

  • Figure 1: Ground truth, noisy data and reconstructions for image deblurring.
  • Figure 2: Training dynamics of the DIP reconstruction: (a) residual errors $\|AG(C_\tau^\epsilon) - y^\epsilon\|$ (and noise level $\epsilon$, indicated by the red horizontal line) and (b) reconstruction errors $\| G(C_\tau^\epsilon) - x^\dagger\|$.

Theorems & Definitions (22)

  • Theorem 2.1
  • Remark 2.2
  • Remark 2.3
  • Remark 3.4
  • Theorem 3.5
  • Proof 1
  • Remark 3.6
  • Remark 3.7
  • Corollary 3.8
  • Proof 2
  • ...and 12 more