Table of Contents
Fetching ...

Per-channel autoregressive linear prediction padding in tiled CNN processing of 2D spatial data

Olli Niemitalo, Otto Rosenberg, Nathaniel Narra, Olli Koskela, Iivari Kunttu

TL;DR

This work tackles padding artifacts in tiled CNN processing of large spatial data by introducing linear prediction padding (lp), a differentiable, per-channel autoregressive padding scheme. It presents two LS-based implementations—covariance and autocorrelation with Tukey window—and implements them in JAX with a differentiable Cholesky solver. The authors train and evaluate LP padding within a satellite image super-resolution model (RVSR), showing modest improvements in mean squared error near image borders and reduced sensitivity to padding choices when output cropping is applied. The approach offers a lightweight, praktical padding alternative that preserves shift-equivariant properties in tiled CNNs and can benefit geospatial imagery processing where memory constraints necessitate tiling.

Abstract

We present linear prediction as a differentiable padding method. For each channel, a stochastic autoregressive linear model is fitted to the padding input by minimizing its noise terms in the least-squares sense. The padding is formed from the expected values of the autoregressive model given the known pixels. We trained the convolutional RVSR super-resolution model from scratch on satellite image data, using different padding methods. Linear prediction padding slightly reduced the mean square super-resolution error compared to zero and replication padding, with a moderate increase in time cost. Linear prediction padding better approximated satellite image data and RVSR feature map data. With zero padding, RVSR appeared to use more of its capacity to compensate for the high approximation error. Cropping the network output by a few pixels reduced the super-resolution error and the effect of the choice of padding method on the error, favoring output cropping with the faster replication and zero padding methods, for the studied workload.

Per-channel autoregressive linear prediction padding in tiled CNN processing of 2D spatial data

TL;DR

This work tackles padding artifacts in tiled CNN processing of large spatial data by introducing linear prediction padding (lp), a differentiable, per-channel autoregressive padding scheme. It presents two LS-based implementations—covariance and autocorrelation with Tukey window—and implements them in JAX with a differentiable Cholesky solver. The authors train and evaluate LP padding within a satellite image super-resolution model (RVSR), showing modest improvements in mean squared error near image borders and reduced sensitivity to padding choices when output cropping is applied. The approach offers a lightweight, praktical padding alternative that preserves shift-equivariant properties in tiled CNNs and can benefit geospatial imagery processing where memory constraints necessitate tiling.

Abstract

We present linear prediction as a differentiable padding method. For each channel, a stochastic autoregressive linear model is fitted to the padding input by minimizing its noise terms in the least-squares sense. The padding is formed from the expected values of the autoregressive model given the known pixels. We trained the convolutional RVSR super-resolution model from scratch on satellite image data, using different padding methods. Linear prediction padding slightly reduced the mean square super-resolution error compared to zero and replication padding, with a moderate increase in time cost. Linear prediction padding better approximated satellite image data and RVSR feature map data. With zero padding, RVSR appeared to use more of its capacity to compensate for the high approximation error. Cropping the network output by a few pixels reduced the super-resolution error and the effect of the choice of padding method on the error, favoring output cropping with the faster replication and zero padding methods, for the studied workload.

Paper Structure

This paper contains 15 sections, 15 equations, 20 figures, 1 table.

Figures (20)

  • Figure 1: Satellite images padded using our linear prediction padding method (variant lp6x7).
  • Figure 2: Valid convolution with a $3{\times3}$ kernel with origin in the middle erodes data spatially by a one-pixel-thick layer at each image edge. The receptive field of a single output pixel () is illustrated.
  • Figure 3: Illustration of some rectangular 2D neighborhoods (with pixels ) next to the pixel of interest (), defining the linear dependency structure of a downwards causal AR model. The extended neighborhood $1{\times}1$ can be defined by $h = [(1, 0), (0, 0)]$ and $2{\times}1$ by $h = [(2, 0), (0, 0), (1, 0)]$.
  • Figure 4: Downwards linear prediction padding with a $4{\times}5$ neighborhood --- Left: predicted () and neighborhood pixels () at the corners of the rectangular area of coordinates over which MSE is calculated during fitting. Right: corner handling prevents narrowing of the recursive prediction front.
  • Figure 5: The convolutional RVSR super-resolution model. The spatial sizes of convolution kernels were $N\times N$ for Conv-$N$. Bilinear image upscaling methods in JAX and PyTorch implicitly replication pad their inputs. We padded upscale input explicitly () with the method configurable separately from padding of Conv inputs (). The RepConv block was converted to a single Conv-3 for inference.
  • ...and 15 more figures