Table of Contents
Fetching ...

Deep Spectral Prior

Yanqi Cheng, Xuxiang Zhao, Tieyong Zeng, Pietro Lio, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero

TL;DR

The paper introduces the Deep Spectral Prior (DSP), a frequency-domain unsupervised framework for image reconstruction that operates in the complex domain to learn amplitude and phase directly. It proves that the DSP loss is equivalent to the pixel-domain objective under a unitary Fourier transform, but yields different, more stable descent dynamics than DIP, including a spectral stability law that orders convergence by frequency and eliminates the need for early stopping. Through NTK-based analysis and spectral decompositions, the authors show how DSP progressively recovers low-frequency content while suppressing high-frequency noise, effectively acting as an implicit frequency-domain regulariser. Empirically, DSP outperforms DIP and other baselines across denoising, inpainting, deblurring, restoration, and super-resolution, demonstrating improved fidelity, robustness, and interpretability in a data-free setting. The work presents a unified frequency-based perspective on implicit priors, with strong theoretical and practical implications for single-image reconstruction tasks.

Abstract

We introduce the Deep Spectral Prior (DSP), a new framework for unsupervised image reconstruction that operates entirely in the complex frequency domain. Unlike the Deep Image Prior (DIP), which optimises pixel-level errors and is highly sensitive to overfitting, DSP performs joint learning of amplitude and phase to capture the full spectral structure of images. We derive a rigorous theoretical characterisation of DSP's optimisation dynamics, proving that it follows frequency-dependent descent trajectories that separate informative low-frequency modes from stochastic high-frequency noise. This spectral mode separation explains DSP's self-regularising behaviour and, for the first time, formally establishes the elimination of DIP's major limitation-its reliance on manual early stopping. Moreover, DSP induces an implicit projection onto a frequency-consistent manifold, ensuring convergence to stable, physically plausible reconstructions without explicit priors or supervision. Extensive experiments on denoising, inpainting, and deblurring demonstrate that DSP consistently surpasses DIP and other unsupervised baselines, achieving superior fidelity, robustness, and theoretical interpretability within a unified, unsupervised data-free framework.

Deep Spectral Prior

TL;DR

The paper introduces the Deep Spectral Prior (DSP), a frequency-domain unsupervised framework for image reconstruction that operates in the complex domain to learn amplitude and phase directly. It proves that the DSP loss is equivalent to the pixel-domain objective under a unitary Fourier transform, but yields different, more stable descent dynamics than DIP, including a spectral stability law that orders convergence by frequency and eliminates the need for early stopping. Through NTK-based analysis and spectral decompositions, the authors show how DSP progressively recovers low-frequency content while suppressing high-frequency noise, effectively acting as an implicit frequency-domain regulariser. Empirically, DSP outperforms DIP and other baselines across denoising, inpainting, deblurring, restoration, and super-resolution, demonstrating improved fidelity, robustness, and interpretability in a data-free setting. The work presents a unified frequency-based perspective on implicit priors, with strong theoretical and practical implications for single-image reconstruction tasks.

Abstract

We introduce the Deep Spectral Prior (DSP), a new framework for unsupervised image reconstruction that operates entirely in the complex frequency domain. Unlike the Deep Image Prior (DIP), which optimises pixel-level errors and is highly sensitive to overfitting, DSP performs joint learning of amplitude and phase to capture the full spectral structure of images. We derive a rigorous theoretical characterisation of DSP's optimisation dynamics, proving that it follows frequency-dependent descent trajectories that separate informative low-frequency modes from stochastic high-frequency noise. This spectral mode separation explains DSP's self-regularising behaviour and, for the first time, formally establishes the elimination of DIP's major limitation-its reliance on manual early stopping. Moreover, DSP induces an implicit projection onto a frequency-consistent manifold, ensuring convergence to stable, physically plausible reconstructions without explicit priors or supervision. Extensive experiments on denoising, inpainting, and deblurring demonstrate that DSP consistently surpasses DIP and other unsupervised baselines, achieving superior fidelity, robustness, and theoretical interpretability within a unified, unsupervised data-free framework.

Paper Structure

This paper contains 9 sections, 4 theorems, 48 equations, 8 figures, 4 tables.

Key Result

Corollary 2.1

Let $x, \hat{x} \in \mathbb{R}^m$, and let $\mathcal{F} : \mathbb{R}^m \to \mathbb{C}^m$ denote the unitary discrete Fourier transform. Then In particular, for any neural network output $f_\theta(z)$ and observation $y$, $\mathcal{L}_{\mathrm{DSP}}(\theta) =\frac{1}{2}\| \mathcal{F}(\mathcal{A} f_\theta(z)) - \mathcal{F}(y) \|_2^2$ is exactly equal to the pixel-domain reconstruction error $\frac{

Figures (8)

  • Figure 1: Optimisation trajectories for DIP, DSP, and conventional priors. DIP (red) overshoots and needs manual early stopping; the conventional prior (green) converges to a biased solution; DSP (blue) follows a stable spectral path and converges close to the GT without early stopping.
  • Figure 2: Visual comparison of blind denoising on the “Baboon” image. DSP (ours) is compared vs. unsupervised baselines, with zoomed-in views highlighting fine details preservation.
  • Figure 3: Blind denoising comparison on the ‘Plane’ image: our DSP method vs. unsupervised baselines, with zoomed-in views highlighting the retention of fine details.
  • Figure 4: Restoration results on the “Barbara” image with heavy noise. DSP (ours) better preserves structured textures (e.g., fabric) compared to DIP, TV, and CSC, as shown in the zoomed-in regions.
  • Figure 5: Comparison of super resolution task with 4$\times$ upscale on the "Zebra" image among Bicubic, TV, DIP, and the supervised methods (LapSRN, ResShift, and SinSR) with our proposed DSP.
  • ...and 3 more figures

Theorems & Definitions (8)

  • Definition 2.1
  • Corollary 2.1: Equivalence of Deep Spectral Prior Loss and Pixel-Space Error
  • Theorem 2.1
  • Theorem 2.2
  • Proposition 2.1: Gradient Dynamics in Frequency Space
  • proof
  • proof
  • proof