Table of Contents
Fetching ...

Deep Internal Learning: Deep Learning from a Single Input

Tom Tirer, Raja Giryes, Se Young Chun, Yonina C. Eldar

TL;DR

This survey addresses deep internal learning, where a DNN is trained from a single input $x_0$ or adapted at inference time, to restore or generate signals when external data are scarce or mismatched. It surveys architecture-based approaches (e.g., Deep Image Prior, Deep Decoder, ZSSR, SinGAN) and optimization-based variants (e.g., DIP-SGLD, Self2Self, SURE/GSURE, BP-TV, PnP/RED), and then discusses test-time adaptation of pretrained models (IDBP-CNN-IA, IAGAN, diffusion-based ADIR) alongside meta-learning strategies (MZSR, MLSR) to reduce fine-tuning costs. A central theme is exploiting self-similarity and the implicit bias of overparameterized networks as priors, enabling high-quality restoration with minimal or no external data. The paper highlights practical trade-offs, such as the need for early stopping in pure internal learning and the computational burden of test-time adaptation, and points to open questions in theoretical guarantees and blind settings. Altogether, the survey frames a bridge between traditional signal-processing priors and modern deep-learning techniques to enable robust, data-efficient reconstruction and editing across imaging modalities and signals.

Abstract

Deep learning, in general, focuses on training a neural network from large labeled datasets. Yet, in many cases there is value in training a network just from the input at hand. This is particularly relevant in many signal and image processing problems where training data is scarce and diversity is large on the one hand, and on the other, there is a lot of structure in the data that can be exploited. Using this information is the key to deep internal-learning strategies, which may involve training a network from scratch using a single input or adapting an already trained network to a provided input example at inference time. This survey paper aims at covering deep internal-learning techniques that have been proposed in the past few years for these two important directions. While our main focus will be on image processing problems, most of the approaches that we survey are derived for general signals (vectors with recurring patterns that can be distinguished from noise) and are therefore applicable to other modalities.

Deep Internal Learning: Deep Learning from a Single Input

TL;DR

This survey addresses deep internal learning, where a DNN is trained from a single input or adapted at inference time, to restore or generate signals when external data are scarce or mismatched. It surveys architecture-based approaches (e.g., Deep Image Prior, Deep Decoder, ZSSR, SinGAN) and optimization-based variants (e.g., DIP-SGLD, Self2Self, SURE/GSURE, BP-TV, PnP/RED), and then discusses test-time adaptation of pretrained models (IDBP-CNN-IA, IAGAN, diffusion-based ADIR) alongside meta-learning strategies (MZSR, MLSR) to reduce fine-tuning costs. A central theme is exploiting self-similarity and the implicit bias of overparameterized networks as priors, enabling high-quality restoration with minimal or no external data. The paper highlights practical trade-offs, such as the need for early stopping in pure internal learning and the computational burden of test-time adaptation, and points to open questions in theoretical guarantees and blind settings. Altogether, the survey frames a bridge between traditional signal-processing priors and modern deep-learning techniques to enable robust, data-efficient reconstruction and editing across imaging modalities and signals.

Abstract

Deep learning, in general, focuses on training a neural network from large labeled datasets. Yet, in many cases there is value in training a network just from the input at hand. This is particularly relevant in many signal and image processing problems where training data is scarce and diversity is large on the one hand, and on the other, there is a lot of structure in the data that can be exploited. Using this information is the key to deep internal-learning strategies, which may involve training a network from scratch using a single input or adapting an already trained network to a provided input example at inference time. This survey paper aims at covering deep internal-learning techniques that have been proposed in the past few years for these two important directions. While our main focus will be on image processing problems, most of the approaches that we survey are derived for general signals (vectors with recurring patterns that can be distinguished from noise) and are therefore applicable to other modalities.
Paper Structure (12 sections, 30 equations, 6 figures, 2 tables)

This paper contains 12 sections, 30 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Learning curves for the reconstruction task using: a natural image, the same plus i.i.d. noise, the same randomly scrambled, and white noise. Natural-looking images result in much faster convergence, whereas noise is rejected. Figure taken from ulyanov2018deepand used by permission of the authors.
  • Figure 2: Internal learning approaches can be divided into two high level classes: 1) techniques that learn only from a single example, and 2) techniques that take an already trained network and fine-tune it at test time. Several representative methods are presented, some of which can be utilized for both approaches. The vertical axis presents the state of the input sample at test-time ($x_0$ in the paper's notation). Editing/generation techniques require it to be a ground truth ("clean") sample, while reconstruction techniques attempt to recover the unknown ground truth sample from a degraded input sample, under some assumptions about the degradation model. In this review paper, we mainly focus on strategies for signal/image reconstruction, which is a classical task in the signal processing community.
  • Figure 3: There are various strategies to perform internal learning. This figure highlights some of the key concepts / ingredients that are used in internal learning methods.
  • Figure 4: "Zero-shot super-resolution" (ZSSR) approach: given a known downsampling model $f(\cdot)=(\cdot)\downarrow_s$ and the low-resolution observation $x_0={(x_{gt})\downarrow_s}$, internal learning is performed by training a moderate size CNN to map ${(x_0)\downarrow_s}$ to $x_0$ (with patch extraction and augmentations). After the optimization phase, the network is applied on $x_0$ to estimate $x_{gt}$. Figure taken from shocher2018zeroand used by permission of the authors.
  • Figure 5: Super-resolution x3 with a bicubic kernel and noise level of $\sqrt{10}/255$. (a) PSNR average over Set5 versus ADAM iteration number; (b) An observed image; (c) DIP recovery; (d) GSURE recovery; (e) DIP-PnP(BM3D) recovery; (f) GSURE-PnP(BM3D) recovery. As observed in (a), the DIP recoveries start fitting the noise at some iteration while the GSURE recoveries do not suffer from this issue. In this example, both DIP and GSURE benefit from an additional prior imposed by the plug-and-play BM3D denoiser. Figures are taken from Abu-Hussein_2022_WACVand used by permission of the authors.
  • ...and 1 more figures