Table of Contents
Fetching ...

Scale-Equivariant Imaging: Self-Supervised Learning for Image Super-Resolution and Deblurring

Jérémy Scanvic, Mike Davies, Patrice Abry, Julián Tachella

TL;DR

This work tackles image super-resolution and deblurring in settings where ground-truth high-resolution images are unavailable by introducing scale-equivariant imaging (SEI), a self-supervised framework that leverages scale invariance to recover high-frequency content lost in bandlimited measurements. SEI combines Stein's unbiased risk estimator (SURE) with a scale-equivariant loss (L_SEQ) that uses downscaled reconstructions and gradient stopping to create high-frequency targets without relying on clean references. Theoretically, the authors show that scale transformations enable identifiability of high-frequency information from bandlimited data, a property not shared by roto-translation invariances. Empirically, SEI matches fully supervised performance and outperforms other self-supervised methods across diverse degradations and image distributions, including medical CT data, with promising prospects for fine-tuning and blind-imaging extensions.

Abstract

Self-supervised methods have recently proved to be nearly as effective as supervised ones in various imaging inverse problems, paving the way for learning-based approaches in scientific and medical imaging applications where ground truth data is hard or expensive to obtain. These methods critically rely on invariance to translations and/or rotations of the image distribution to learn from incomplete measurement data alone. However, existing approaches fail to obtain competitive performances in the problems of image super-resolution and deblurring, which play a key role in most imaging systems. In this work, we show that invariance to roto-translations is insufficient to learn from measurements that only contain low-frequency information. Instead, we propose scale-equivariant imaging, a new self-supervised approach that leverages the fact that many image distributions are approximately scale-invariant, enabling the recovery of high-frequency information lost in the measurement process. We demonstrate throughout a series of experiments on real datasets that the proposed method outperforms other self-supervised approaches, and obtains performances on par with fully supervised learning.

Scale-Equivariant Imaging: Self-Supervised Learning for Image Super-Resolution and Deblurring

TL;DR

This work tackles image super-resolution and deblurring in settings where ground-truth high-resolution images are unavailable by introducing scale-equivariant imaging (SEI), a self-supervised framework that leverages scale invariance to recover high-frequency content lost in bandlimited measurements. SEI combines Stein's unbiased risk estimator (SURE) with a scale-equivariant loss (L_SEQ) that uses downscaled reconstructions and gradient stopping to create high-frequency targets without relying on clean references. Theoretically, the authors show that scale transformations enable identifiability of high-frequency information from bandlimited data, a property not shared by roto-translation invariances. Empirically, SEI matches fully supervised performance and outperforms other self-supervised methods across diverse degradations and image distributions, including medical CT data, with promising prospects for fine-tuning and blind-imaging extensions.

Abstract

Self-supervised methods have recently proved to be nearly as effective as supervised ones in various imaging inverse problems, paving the way for learning-based approaches in scientific and medical imaging applications where ground truth data is hard or expensive to obtain. These methods critically rely on invariance to translations and/or rotations of the image distribution to learn from incomplete measurement data alone. However, existing approaches fail to obtain competitive performances in the problems of image super-resolution and deblurring, which play a key role in most imaging systems. In this work, we show that invariance to roto-translations is insufficient to learn from measurements that only contain low-frequency information. Instead, we propose scale-equivariant imaging, a new self-supervised approach that leverages the fact that many image distributions are approximately scale-invariant, enabling the recovery of high-frequency information lost in the measurement process. We demonstrate throughout a series of experiments on real datasets that the proposed method outperforms other self-supervised approaches, and obtains performances on par with fully supervised learning.
Paper Structure (22 sections, 4 theorems, 44 equations, 7 figures, 4 tables)

This paper contains 22 sections, 4 theorems, 44 equations, 7 figures, 4 tables.

Key Result

Theorem 1

Let ${h}, \varphi \in \mathcal{S}$. If ${h}$ and $\varphi$ are bandlimited with bandwidths $\xi_{h} < \xi_\varphi$, then there are sets $\mathcal{Z}_1, \mathcal{Z}_2 \subseteq \mathcal{S}$ such that

Figures (7)

  • Figure 1: Scale-equivariant imaging. The proposed loss is the sum of the SURE loss $\mathcal{L}_{\text{SURE}}(\theta)$ which penalizes reconstruction error in the measurement domain, and of the scale-equivariant loss $\mathcal{L}_{\text{SEQ}}(\theta)$ by viewing $x^{(1)} = f_\theta(y)$ as a ground truth which might lack high-frequency content, and the downscaled image $x^{(2)}$ as another one which has high-frequency content, and finally by penalizing the mean squared error of the reconstruction $x^{(3)}$ obtained from a noisy measurement of $x^{(2)}$. The difference in high-frequency content is shown by $\hat{x}^{(1)}$ and $\hat{x}^{(2)}$, the discrete Fourier transforms of $x^{(1)}$ and $x^{(2)}$. One of our key contributions is stopping the gradient of $x^{(2)}$ during the stochastic gradient descent.
  • Figure 2: Spectral effect of rescaling. In bandlimited problems such as image super-resolution and deblurring, the image frequencies are only observed on a low-frequency band $[0, \xi_h)$ determined by the bandlimiting operator (e.g., blur and anti-aliasing filters), and the goal is to recover texture information in a higher frequency band $(\xi_h, \xi_\varphi)$ where $\xi_\varphi / \xi_h$ can be interpreted as the increase in effective resolution. Scale-invariance makes this possible as the spectral information is available at all scales: an image $z({u})$ and the rescaled image $\Sigma_s z({u})$ with scaling factor $s$ have the same spectral content except in different frequency bands.
  • Figure 3: Implementation of the (down-)scaling transformations. Scale-equivariant imaging (SEI) uses rescaled versions of network outputs as learning targets implemented as bicubic resampling on a coarser grid randomly sampled for each batch element. Using a coarser grid makes the resulting image close to the target manifold $\mathcal{X}$ as its spectral information corresponds to the low-frequency content of the (unknown) target image thanks to the loss $\mathcal{L}_{\text{SURE}}$ and to the scale-invariance of the latent image set $\mathcal{Z}$.
  • Figure 4: Visual comparison for Gaussian deblurring. Top-right corner: PSNR, bottom-left corner: point-spread functions.
  • Figure 5: Point-spread functions. Our method makes no assumption about the shape or strength of the blur kernel making it broadly applicable and in particular we verify that it performs well on Gaussian filters with standard deviation $1$, $2$ or $3$ pixels, and for box filters with radius $2$, $3$ or $4$ pixels.
  • ...and 2 more figures

Theorems & Definitions (10)

  • Definition 1: Bandlimited, bandwidth
  • Definition 2: Scaling transforms, translations and rotations
  • Theorem 1
  • Theorem 2
  • Definition 3: Fourier transform, inverse Fourier transform
  • Definition 4: Convolution
  • Theorem 2
  • proof
  • Theorem 2
  • proof