Table of Contents
Fetching ...

Normalization-Equivariant Neural Networks with Application to Image Denoising

Sébastien Herbreteau, Emmanuel Moebel, Charles Kervrann

TL;DR

All activation functions, including the ReLU (rectified linear unit), should be completely removed from neural networks and replaced by better conditioned alternatives, and affine-constrained convolutions and channel-wise sort pooling layers as surrogates are introduced.

Abstract

In many information processing systems, it may be desirable to ensure that any change of the input, whether by shifting or scaling, results in a corresponding change in the system response. While deep neural networks are gradually replacing all traditional automatic processing methods, they surprisingly do not guarantee such normalization-equivariance (scale + shift) property, which can be detrimental in many applications. To address this issue, we propose a methodology for adapting existing neural networks so that normalization-equivariance holds by design. Our main claim is that not only ordinary convolutional layers, but also all activation functions, including the ReLU (rectified linear unit), which are applied element-wise to the pre-activated neurons, should be completely removed from neural networks and replaced by better conditioned alternatives. To this end, we introduce affine-constrained convolutions and channel-wise sort pooling layers as surrogates and show that these two architectural modifications do preserve normalization-equivariance without loss of performance. Experimental results in image denoising show that normalization-equivariant neural networks, in addition to their better conditioning, also provide much better generalization across noise levels.

Normalization-Equivariant Neural Networks with Application to Image Denoising

TL;DR

All activation functions, including the ReLU (rectified linear unit), should be completely removed from neural networks and replaced by better conditioned alternatives, and affine-constrained convolutions and channel-wise sort pooling layers as surrogates are introduced.

Abstract

In many information processing systems, it may be desirable to ensure that any change of the input, whether by shifting or scaling, results in a corresponding change in the system response. While deep neural networks are gradually replacing all traditional automatic processing methods, they surprisingly do not guarantee such normalization-equivariance (scale + shift) property, which can be detrimental in many applications. To address this issue, we propose a methodology for adapting existing neural networks so that normalization-equivariance holds by design. Our main claim is that not only ordinary convolutional layers, but also all activation functions, including the ReLU (rectified linear unit), which are applied element-wise to the pre-activated neurons, should be completely removed from neural networks and replaced by better conditioned alternatives. To this end, we introduce affine-constrained convolutions and channel-wise sort pooling layers as surrogates and show that these two architectural modifications do preserve normalization-equivariance without loss of performance. Experimental results in image denoising show that normalization-equivariant neural networks, in addition to their better conditioning, also provide much better generalization across noise levels.
Paper Structure (36 sections, 5 theorems, 14 equations, 10 figures, 6 tables)

This paper contains 36 sections, 5 theorems, 14 equations, 10 figures, 6 tables.

Key Result

Lemma 1

$f : \mathbb{R}^n \mapsto \mathbb{R}^m$ is entirely determined by its values on the: where $\mathbf{1}_n$ denotes the all-ones vector of $\mathbb{R}^n$.

Figures (10)

  • Figure 1: Influence of normalization for deep-learning-based image denoising. The raw input data is a publicly available real-world noisy image of the Convallaria dataset convallaria. "Blind" DnCNN dncnn with official pre-trained weights is used for denoising and is applied on four different normalization intervals displayed in red, each of which being included in $[0,1]$ over which it was learned. PSNR is calculated with the average of $100$ independent noisy static acquisitions of the same sample (called ground truth). Interestingly, the straightforward interval $[0, 1]$ does not give the best results. Normalization intervals are (a) $[0,1]$, (b) $[0.08, 0.12]$, (c) $[0.48, 0,52]$ and (d) $[0.64, 0.96]$. In the light of the denoising results $(b)$-$(c)$ and $(b)$-$(d)$, DnCNN is neither shift-equivariant, nor scale-equivariant.
  • Figure 2: Illustration of the proposed alternative for replacing the traditional scheme "convolution + element-wise activation function" in convolutional neural networks: affine convolutions supersede ordinary ones by restricting the coefficients of each kernel to sum to one and the proposed sort pooling patterns introduce nonlinearities by sorting two by two the pre-activated neurons along the channels.
  • Figure 3: Visual comparisons of the generalization capabilities of a scale-equivariant neural network (left) and its normalization-equivariant counterpart (right) for Gaussian noise. Both networks were trained for Gaussian noise at noise level $\sigma=25$ exclusively. The adaptive filters (rows of $A_\theta^{y_{r}}$ in Prop. \ref{['theorem']}) are indicated for two particular pixels as well as the sum of their coefficients (note that some weights are negative, indicated in red). The scale-equivariant network tends to excessively smooth out the image when evaluated at a lower noise level, whereas the normalization-equivariant network is more adaptable and considers the underlying texture to a greater extent.
  • Figure 4: Comparison of the performance of our normalization-equivariant alternative with its scale-equivariant and ordinary counterparts for Gaussian denoising with the same architecture on Set12 dataset. The vertical blue line indicates the unique noise level on which the "blind" networks were trained exclusively (from left to right: $\sigma = 50$, $\sigma=25$ and $\sigma=10$). In all cases, normalization-equivariant networks generalize much more robustly beyond the training noise level.
  • Figure 5: Denoising results for example images of the form $y = x + \lambda \varepsilon$ (see notations of subsection \ref{['subsection_noise']}) with $\sigma = 25/255$ and $x \in [0,1]^n$ by "blind" CNNs specialized for noise level $\sigma$ only. $f_\theta^\varnothing$, $f_{\theta}^{\text{SE}}$ and $f_{\theta}^{\text{NE}}$ denote the ordinary, scale-equivariant and normalization-equivariant variants, respectively. In order to get the best results with $f_\theta^\varnothing$ and $f_{\theta}^{\text{SE}}$, it is necessary know the renormalization parameters $(\lambda, \mu)$ such that $(x- \mu)/ \lambda$ belongs to $\mathcal{D} \subset [0, 1]^n$ (see subsection \ref{['subsection_noise']}). Note that for $f_{\theta}^{\text{SE}}$, it is however sufficient to know only $\mu$ as $\lambda$ is implicit by construction. In contrast, $f_{\theta}^{\text{NE}}$ can be applied directly.
  • ...and 5 more figures

Theorems & Definitions (6)

  • Definition 1
  • Lemma 1: Characterizations
  • Lemma 2: Operations preserving equivariance
  • Proposition 1
  • Proposition 2
  • Proposition 3