Table of Contents
Fetching ...

Deep Image Prior

Dmitry Ulyanov, Andrea Vedaldi, Victor Lempitsky

TL;DR

This work shows that the structure of a randomly initialized convolutional generator encodes a strong low-level image prior, enabling effective single-image restoration across denoising, super-resolution, and inpainting without any training data. By optimizing the network parameters in the form x = f_theta(z) to fit a degraded image, the method acts as a handcrafted prior whose strength emerges from architecture (e.g., hourglass with skip connections) rather than learned weights. The approach yields competitive results with non-learned baselines and approaches state-of-the-art learned methods in some tasks, while also enabling applications like natural pre-image inversion and activation maximization. The findings highlight the importance of architectural priors in image generation and restoration, suggesting that future advances may come from designing networks whose implicit priors align with natural image statistics rather than relying solely on large datasets.

Abstract

Deep convolutional networks have become a popular tool for image generation and restoration. Generally, their excellent performance is imputed to their ability to learn realistic image priors from a large number of example images. In this paper, we show that, on the contrary, the structure of a generator network is sufficient to capture a great deal of low-level image statistics prior to any learning. In order to do so, we show that a randomly-initialized neural network can be used as a handcrafted prior with excellent results in standard inverse problems such as denoising, super-resolution, and inpainting. Furthermore, the same prior can be used to invert deep neural representations to diagnose them, and to restore images based on flash-no flash input pairs. Apart from its diverse applications, our approach highlights the inductive bias captured by standard generator network architectures. It also bridges the gap between two very popular families of image restoration methods: learning-based methods using deep convolutional networks and learning-free methods based on handcrafted image priors such as self-similarity. Code and supplementary material are available at https://dmitryulyanov.github.io/deep_image_prior .

Deep Image Prior

TL;DR

This work shows that the structure of a randomly initialized convolutional generator encodes a strong low-level image prior, enabling effective single-image restoration across denoising, super-resolution, and inpainting without any training data. By optimizing the network parameters in the form x = f_theta(z) to fit a degraded image, the method acts as a handcrafted prior whose strength emerges from architecture (e.g., hourglass with skip connections) rather than learned weights. The approach yields competitive results with non-learned baselines and approaches state-of-the-art learned methods in some tasks, while also enabling applications like natural pre-image inversion and activation maximization. The findings highlight the importance of architectural priors in image generation and restoration, suggesting that future advances may come from designing networks whose implicit priors align with natural image statistics rather than relying solely on large datasets.

Abstract

Deep convolutional networks have become a popular tool for image generation and restoration. Generally, their excellent performance is imputed to their ability to learn realistic image priors from a large number of example images. In this paper, we show that, on the contrary, the structure of a generator network is sufficient to capture a great deal of low-level image statistics prior to any learning. In order to do so, we show that a randomly-initialized neural network can be used as a handcrafted prior with excellent results in standard inverse problems such as denoising, super-resolution, and inpainting. Furthermore, the same prior can be used to invert deep neural representations to diagnose them, and to restore images based on flash-no flash input pairs. Apart from its diverse applications, our approach highlights the inductive bias captured by standard generator network architectures. It also bridges the gap between two very popular families of image restoration methods: learning-based methods using deep convolutional networks and learning-free methods based on handcrafted image priors such as self-similarity. Code and supplementary material are available at https://dmitryulyanov.github.io/deep_image_prior .

Paper Structure

This paper contains 16 sections, 11 equations, 21 figures, 2 tables.

Figures (21)

  • Figure 1: Super-resolution using the deep image prior. Our method uses a randomly-initialized ConvNet to upsample an image, using its structure as an image prior; similar to bicubic upsampling, this method does not require learning, but produces much cleaner results with sharper edges. In fact, our results are quite close to state-of-the-art super-resolution methods that use ConvNets learned from large datasets. The deep image prior works well for all inverse problems we could test.
  • Figure 2: Image restoration using the deep image prior. Starting from a random weights $\theta_0$, we iteratively update them in order to minimize the data term \ref{['eq:reparametrization']}. At every iteration $t$ the weights $\theta$ are mapped to an image $x = f_\theta(z)$, where $z$ is a fixed tensor and the mapping $f$ is a neural network with parameters $\theta$. The image $x$ is used to compute the task-dependent loss $E(x, x_0)$. The gradient of the loss w.r.t. the weights $\theta$ is then computed and used to update the parameters.
  • Figure 3: Restoration with priors --- image space visualization. We consider the problem of reconstructing an image $x_{\text{gt}}$ from a degraded measurement $x_0$. We distinguish two cases. Left --- in the first case, exemplified by super-resolution, the ground-truth solution $x_{\text{gt}}$ belongs to a manifold of points $x$ that have null energy $x: E(x ,x_0)=0$ (shown in gray) and optimization can land on a point $x^*$ still quite far from $x_\text{gt}$ (purple curve). Adding a conventional prior $R(x)$ tweaks the energy so that the optimizer $x^*$ is closer to the ground truth (green curve). The deep image prior has a similar effect, but achieves it by tweaking the optimization trajectory via re-parametrization, often with better results than conventional priors. Right --- in the second case, exemplified by denoising, the ground truth $x_{\text{gt}}$ has non-zero cost $E(x_{\text{gt}},x_0)>0$. Here, if run for long enough, fitting with deep image prior will obtain a solution with near zero cost quite far from $x_{\text{gt}}$. However, often the optimization path will pass close to $x_{\text{gt}}$, and an early stopping (here at time $t_3$) will recover good solution. Below, we show that deep image prior often helps for problems of both types.
  • Figure 4: Learning curves for the reconstruction task using: a natural image, the same plus i.i.d. noise, the same randomly scrambled, and white noise. Naturally-looking images result in much faster convergence, whereas noise is rejected.
  • Figure 5: "Samples" from the deep image prior. We show images that are produced by ConvNets with random weights from independent random uniform noise. Each column shows two images $f_\theta(z)$ for the same architecture, same input noise $z$, and two different random $\theta$. The following architectures are visualized: a) an hourglass architecture with one downsampling and one bilinear upsampling, b) a deeper hourglass architecture with three downsampling and three bilinear upsampling layers, c) an even deeper hourglass architecture with five downsampling and five bilinear upsampling layers, d) same as (c), but with skip connections (each skip connection has a convolution layer), e) same as (d), but with nearest upsampling. Note how the resulting images are far from independent noise and correspond to stochastic processes producing spatial structures with clear self-similarity (e.g. each image has a distinctive palette). The scale of structures naturally changes with the depth of the network. "Samples" for hourglass networks with skip connections (U-Net type) combine structures of different scales, as is typical for natural images.
  • ...and 16 more figures