Table of Contents
Fetching ...

The Neural Tangent Link Between CNN Denoisers and Non-Local Filters

Julián Tachella, Junqi Tang, Mike Davies

TL;DR

This work introduces a formal link between CNNs through their neural tangent kernel (NTK), and well-known non-local filtering techniques, such as non-local means or BM3D, and shows that the NTK theory accurately predicts the filter associated with networks trained using standard gradient descent, but falls short to explain the behaviour of networks trained using the popular Adam optimizer.

Abstract

Convolutional Neural Networks (CNNs) are now a well-established tool for solving computational imaging problems. Modern CNN-based algorithms obtain state-of-the-art performance in diverse image restoration problems. Furthermore, it has been recently shown that, despite being highly overparameterized, networks trained with a single corrupted image can still perform as well as fully trained networks. We introduce a formal link between such networks through their neural tangent kernel (NTK), and well-known non-local filtering techniques, such as non-local means or BM3D. The filtering function associated with a given network architecture can be obtained in closed form without need to train the network, being fully characterized by the random initialization of the network weights. While the NTK theory accurately predicts the filter associated with networks trained using standard gradient descent, our analysis shows that it falls short to explain the behaviour of networks trained using the popular Adam optimizer. The latter achieves a larger change of weights in hidden layers, adapting the non-local filtering function during training. We evaluate our findings via extensive image denoising experiments.

The Neural Tangent Link Between CNN Denoisers and Non-Local Filters

TL;DR

This work introduces a formal link between CNNs through their neural tangent kernel (NTK), and well-known non-local filtering techniques, such as non-local means or BM3D, and shows that the NTK theory accurately predicts the filter associated with networks trained using standard gradient descent, but falls short to explain the behaviour of networks trained using the popular Adam optimizer.

Abstract

Convolutional Neural Networks (CNNs) are now a well-established tool for solving computational imaging problems. Modern CNN-based algorithms obtain state-of-the-art performance in diverse image restoration problems. Furthermore, it has been recently shown that, despite being highly overparameterized, networks trained with a single corrupted image can still perform as well as fully trained networks. We introduce a formal link between such networks through their neural tangent kernel (NTK), and well-known non-local filtering techniques, such as non-local means or BM3D. The filtering function associated with a given network architecture can be obtained in closed form without need to train the network, being fully characterized by the random initialization of the network weights. While the NTK theory accurately predicts the filter associated with networks trained using standard gradient descent, our analysis shows that it falls short to explain the behaviour of networks trained using the popular Adam optimizer. The latter achieves a larger change of weights in hidden layers, adapting the non-local filtering function during training. We evaluate our findings via extensive image denoising experiments.

Paper Structure

This paper contains 31 sections, 63 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: A convolutional neural network $z$ trained with gradient descent on a single corrupted image can achieve powerful denoising. The left eigenvectors of the Jacobian provide a representation based on patch similarities which is robust to noise.
  • Figure 2: Non-local filter associated to the tangent kernel of a CNN with a single hidden layer. a) The filter can be obtained in closed form as the number of channels tends to infinity, where each $(i,j)$th entry corresponds to the similarity between the patches centered at pixels $i$ and $j$. b) Filter weights for different pixels in the house image, where red/white indicates a higher weight, and blue indicates a zero weight.
  • Figure 3: Results for the 'house' image. PSNR values are reported below each restored image. The best results are obtained by the autoencoder trained with Adam, which is able to provide smoother estimates while preserving sharp edges. However, it provides worse estimates of images with noise-like textures, such as the 'baboon' image (see Appendix I).
  • Figure 4: Comparison of Adam and GD training of an autoencoder with noise at the input as a function of the number of channels. The PSNR for the 'house' image is shown on the left plot, whereas the average $\ell_2$ and $\ell_\infty$ change of weights in hidden layers is shown on the center and right plots respectively. The error bars denote the maximum and minimum values obtained in 10 Monte Carlo repetitions.
  • Figure 5: First 3 leading eigenvectors of the covariance of the last preactivations, $\Sigma_{{a}^{L-1}}$, after 500 iterations of training with Adam or gradient descent with different inputs (noise or image).
  • ...and 4 more figures