Table of Contents
Fetching ...

Rotation-Equivariant Self-Supervised Method in Image Denoising

Hanze Liu, Jiahong Fu, Qi Xie, Deyu Meng

TL;DR

The paper addresses the limit of self-supervised image denoising by introducing rotation-equivariant priors into network architectures. It replaces standard convolutions with Fourier-based rotation-equivariant convolutions and provides theoretical bounds on equivariance errors for downsampling, upsampling, and the full U-Net, ensuring reliable rotation symmetry. The authors propose AdaReNet, an adaptive framework that fuses outputs from rotation-equivariant and vanilla networks via a MaskNetwork and a self-correcting module, achieving improved denoising on multiple self-supervised baselines. The work demonstrates that incorporating rotation symmetry can enhance robustness and performance in self-supervised denoising, offering a new direction for leveraging geometric priors in low-supervision settings.

Abstract

Self-supervised image denoising methods have garnered significant research attention in recent years, for this kind of method reduces the requirement of large training datasets. Compared to supervised methods, self-supervised methods rely more on the prior embedded in deep networks themselves. As a result, most of the self-supervised methods are designed with Convolution Neural Networks (CNNs) architectures, which well capture one of the most important image prior, translation equivariant prior. Inspired by the great success achieved by the introduction of translational equivariance, in this paper, we explore the way to further incorporate another important image prior. Specifically, we first apply high-accuracy rotation equivariant convolution to self-supervised image denoising. Through rigorous theoretical analysis, we have proved that simply replacing all the convolution layers with rotation equivariant convolution layers would modify the network into its rotation equivariant version. To the best of our knowledge, this is the first time that rotation equivariant image prior is introduced to self-supervised image denoising at the network architecture level with a comprehensive theoretical analysis of equivariance errors, which offers a new perspective to the field of self-supervised image denoising. Moreover, to further improve the performance, we design a new mask mechanism to fusion the output of rotation equivariant network and vanilla CNN-based network, and construct an adaptive rotation equivariant framework. Through extensive experiments on three typical methods, we have demonstrated the effectiveness of the proposed method.

Rotation-Equivariant Self-Supervised Method in Image Denoising

TL;DR

The paper addresses the limit of self-supervised image denoising by introducing rotation-equivariant priors into network architectures. It replaces standard convolutions with Fourier-based rotation-equivariant convolutions and provides theoretical bounds on equivariance errors for downsampling, upsampling, and the full U-Net, ensuring reliable rotation symmetry. The authors propose AdaReNet, an adaptive framework that fuses outputs from rotation-equivariant and vanilla networks via a MaskNetwork and a self-correcting module, achieving improved denoising on multiple self-supervised baselines. The work demonstrates that incorporating rotation symmetry can enhance robustness and performance in self-supervised denoising, offering a new direction for leveraging geometric priors in low-supervision settings.

Abstract

Self-supervised image denoising methods have garnered significant research attention in recent years, for this kind of method reduces the requirement of large training datasets. Compared to supervised methods, self-supervised methods rely more on the prior embedded in deep networks themselves. As a result, most of the self-supervised methods are designed with Convolution Neural Networks (CNNs) architectures, which well capture one of the most important image prior, translation equivariant prior. Inspired by the great success achieved by the introduction of translational equivariance, in this paper, we explore the way to further incorporate another important image prior. Specifically, we first apply high-accuracy rotation equivariant convolution to self-supervised image denoising. Through rigorous theoretical analysis, we have proved that simply replacing all the convolution layers with rotation equivariant convolution layers would modify the network into its rotation equivariant version. To the best of our knowledge, this is the first time that rotation equivariant image prior is introduced to self-supervised image denoising at the network architecture level with a comprehensive theoretical analysis of equivariance errors, which offers a new perspective to the field of self-supervised image denoising. Moreover, to further improve the performance, we design a new mask mechanism to fusion the output of rotation equivariant network and vanilla CNN-based network, and construct an adaptive rotation equivariant framework. Through extensive experiments on three typical methods, we have demonstrated the effectiveness of the proposed method.

Paper Structure

This paper contains 15 sections, 4 theorems, 22 equations, 6 figures, 6 tables.

Key Result

Theorem 1

Assume that a feature map $F \in \mathbb{R}^{n \times n \times t}$ is discretized from the smooth function $e: \mathbb{R}^2 \times S \rightarrow \mathbb{R}$, $|S|=t$, the mesh size is $h$, $D(\cdot)$ is the downsampling operator. If for any $A,B \in S, x \in \mathbb{R}^2$, the following conditions a then the following results are satisfied:

Figures (6)

  • Figure 1: Illustration of the output feature map of a typical image obtained by standard CNN and our used rotation equivariant convolution neural network. Both networks are initialized randomly.
  • Figure 2: The network architecture of the equivariant N2N method. The network can be divided into multiple upsampling and downsampling blocks. Each downsampling block (DB) consists of one E-Conv layer and a downsampling operator, while each upsampling block (UB) is composed of an upsampling operator and two E-Conv layers.
  • Figure 3: Illustrations of our proposed adaptive network AdaReNet. Specifically, $I \in \mathbb{R}^{\mathrm{H} \times \mathrm{W} \times \mathrm{C}}$ represents a noisy image, where $H$ and $W$ represent the spatial dimensions, and $C$ denotes the channel dimension. The Vanilla Module and EQ Module each produce their respective preliminary denoising results, denoted as $f_c$ and $f_e$. The Fusion Module $Mask(\cdot)$ automatically decides which areas of the image to use more EQ Module would gain more benefit. After adaptive fusion by $Mask(\cdot)$ and correction by the Self-correcting Module $S_c(\cdot)$, the final denoised image $\bar{I}$ is output.
  • Figure 4: (a) An image from the Kodak dataset, (b) the heatmap of the low-frequency component, (c) the heatmap of the high-frequency component, (d) the output of our proposed MaskNetwork (the brighter area indicates the use of more Vanilla Module).
  • Figure 5: N2N: image denoising results of one image from kodak with $\sigma=50$.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Corollary 1