Table of Contents
Fetching ...

CoordGate: Efficiently Computing Spatially-Varying Convolutions in Convolutional Neural Networks

Sunny Howard, Peter Norreys, Andreas Döpp

TL;DR

CoordGate tackles the inefficiency of learning spatially varying convolutions in CNNs by gating a standard CNN feature map with a coordinate-encoding network to produce per-pixel filter amplitudes. The gating uses a Hadamard product between the CNN output and a coordinate-derived gating map, enabling spatially varying filtering with minimal parameter overhead. It is validated on a 1D synthetic spatially varying convolution task and a 2D image deblurring task with static PSFs applied to microscopy data, outperforming CoordConv-UNet and MultiWienerNet while using far fewer parameters than deep baselines. The approach promises improved efficiency and accuracy for spatially-aware vision tasks in optical imaging and related domains.

Abstract

Optical imaging systems are inherently limited in their resolution due to the point spread function (PSF), which applies a static, yet spatially-varying, convolution to the image. This degradation can be addressed via Convolutional Neural Networks (CNNs), particularly through deblurring techniques. However, current solutions face certain limitations in efficiently computing spatially-varying convolutions. In this paper we propose CoordGate, a novel lightweight module that uses a multiplicative gate and a coordinate encoding network to enable efficient computation of spatially-varying convolutions in CNNs. CoordGate allows for selective amplification or attenuation of filters based on their spatial position, effectively acting like a locally connected neural network. The effectiveness of the CoordGate solution is demonstrated within the context of U-Nets and applied to the challenging problem of image deblurring. The experimental results show that CoordGate outperforms conventional approaches, offering a more robust and spatially aware solution for CNNs in various computer vision applications.

CoordGate: Efficiently Computing Spatially-Varying Convolutions in Convolutional Neural Networks

TL;DR

CoordGate tackles the inefficiency of learning spatially varying convolutions in CNNs by gating a standard CNN feature map with a coordinate-encoding network to produce per-pixel filter amplitudes. The gating uses a Hadamard product between the CNN output and a coordinate-derived gating map, enabling spatially varying filtering with minimal parameter overhead. It is validated on a 1D synthetic spatially varying convolution task and a 2D image deblurring task with static PSFs applied to microscopy data, outperforming CoordConv-UNet and MultiWienerNet while using far fewer parameters than deep baselines. The approach promises improved efficiency and accuracy for spatially-aware vision tasks in optical imaging and related domains.

Abstract

Optical imaging systems are inherently limited in their resolution due to the point spread function (PSF), which applies a static, yet spatially-varying, convolution to the image. This degradation can be addressed via Convolutional Neural Networks (CNNs), particularly through deblurring techniques. However, current solutions face certain limitations in efficiently computing spatially-varying convolutions. In this paper we propose CoordGate, a novel lightweight module that uses a multiplicative gate and a coordinate encoding network to enable efficient computation of spatially-varying convolutions in CNNs. CoordGate allows for selective amplification or attenuation of filters based on their spatial position, effectively acting like a locally connected neural network. The effectiveness of the CoordGate solution is demonstrated within the context of U-Nets and applied to the challenging problem of image deblurring. The experimental results show that CoordGate outperforms conventional approaches, offering a more robust and spatially aware solution for CNNs in various computer vision applications.
Paper Structure (8 sections, 7 equations, 7 figures)

This paper contains 8 sections, 7 equations, 7 figures.

Figures (7)

  • Figure 1: The position encoding effects from 'same' padding. (a): Convolving a uniform input with a $3\times 3$ uniform kernel 5 times. (b&c): The same effect for 2 U-Net architectures, containing 2 and 4 steps of down-and-up sampling respectively. Before each dimension-changing operation, three $3\times3$ convolutions were applied, except for the middle layer in (b), where 12 additional convolutions were applied so each model had the same total number of convolutions.
  • Figure 2: CoordGate. The data, $\mathbf{X}$, and coordinates, $\mathbf{C}$, are fed through a CNN and a MLP respectively, before the Hadamard product is used between the resultant tensors.
  • Figure 3: Showing the approximations of the convolution matrix by different models. Also shown is a plot of PSNR against inference time for each model, with the spot size being proportional to the number of parameters in the model.
  • Figure 4: (a): The backbone U-Net architecture. A model with depth, $d$, has $n_c[d]$ channels in its deepest layer. The yellow CoordGate arrows are added for the CG U-Net($d$) models. (b): A plot of PSNR against the logarithm of the number of parameters, demonstrating the advantage of adding CoordGate to the U-Net architecture. Also included are the CoordConv-UNet and MultiWienerNet models.
  • Figure 5: An example showing the blurring and subsequent de-blurring process using our CoordGate technique.
  • ...and 2 more figures