Table of Contents
Fetching ...

Normalizing Basis Functions: Approximate Stationary Models for Large Spatial Data

Antony Sikorski, Daniel McKenzie, Douglas Nychka

TL;DR

Two fast and accurate algorithms to the normalization step are introduced, allowing for efficient prediction on fine grids, and can be adapted to other basis function methods operating on regular grids.

Abstract

In geostatistics, traditional spatial models often rely on the Gaussian Process (GP) to fit stationary covariances to data. It is well known that this approach becomes computationally infeasible when dealing with large data volumes, necessitating the use of approximate methods. A powerful class of methods approximate the GP as a sum of basis functions with random coefficients. Although this technique offers computational efficiency, it does not inherently guarantee a stationary covariance. To mitigate this issue, the basis functions can be "normalized" to maintain a constant marginal variance, avoiding unwanted artifacts and edge effects. This allows for the fitting of nearly stationary models to large, potentially non-stationary datasets, providing a rigorous base to extend to more complex problems. Unfortunately, the process of normalizing these basis functions is computationally demanding. To address this, we introduce two fast and accurate algorithms to the normalization step, allowing for efficient prediction on fine grids. The practical value of these algorithms is showcased in the context of a spatial analysis on a large dataset, where significant computational speedups are achieved. While implementation and testing are done specifically within the LatticeKrig framework, these algorithms can be adapted to other basis function methods operating on regular grids.

Normalizing Basis Functions: Approximate Stationary Models for Large Spatial Data

TL;DR

Two fast and accurate algorithms to the normalization step are introduced, allowing for efficient prediction on fine grids, and can be adapted to other basis function methods operating on regular grids.

Abstract

In geostatistics, traditional spatial models often rely on the Gaussian Process (GP) to fit stationary covariances to data. It is well known that this approach becomes computationally infeasible when dealing with large data volumes, necessitating the use of approximate methods. A powerful class of methods approximate the GP as a sum of basis functions with random coefficients. Although this technique offers computational efficiency, it does not inherently guarantee a stationary covariance. To mitigate this issue, the basis functions can be "normalized" to maintain a constant marginal variance, avoiding unwanted artifacts and edge effects. This allows for the fitting of nearly stationary models to large, potentially non-stationary datasets, providing a rigorous base to extend to more complex problems. Unfortunately, the process of normalizing these basis functions is computationally demanding. To address this, we introduce two fast and accurate algorithms to the normalization step, allowing for efficient prediction on fine grids. The practical value of these algorithms is showcased in the context of a spatial analysis on a large dataset, where significant computational speedups are achieved. While implementation and testing are done specifically within the LatticeKrig framework, these algorithms can be adapted to other basis function methods operating on regular grids.
Paper Structure (21 sections, 33 equations, 7 figures, 1 table, 1 algorithm)

This paper contains 21 sections, 33 equations, 7 figures, 1 table, 1 algorithm.

Figures (7)

  • Figure 1: A simple, one-dimensional scenario, where a small amount of noise is added to data sampled from a quadratic. Left: Both normalized and un-normalized basis function models are fitted to the data. Right: Finite difference approximation of the gradient for both fits reveals the artifacts more clearly.
  • Figure 2: Top row:$\varphi_1(s),\ldots, \varphi_5(s)$ where $\psi$ is a Wendland polynomial, dilated to yield large overlap ( left) and small overlap ( right). Bottom row: the corresponding $\Phi \in \mathbb{R}^{21\times 5}$, demonstrating the sparsity pattern. While less overlap results in a sparser $\Phi$ ( Right column), it also results in a less uniform variance (magenta dotted line in Top row).
  • Figure 3: Diagrammatic description of the proposed method. In this case $r = 5$ and $\tilde{n} = 13$, with basis function centers denoted by white points. The calculation is then upsampled to a grid with $n = 31$. A visual comparison of the true (exact) variance on the finer grid is provided. The log of the values is taken in the FFT domain to provide a better visual demonstration.
  • Figure 4: Timing results on a logarithmic scale for each method where $r$ is varied: 25 (top left), 35 (top right), 50 (bottom left), 100 (bottom right), as is $n$: (500, 750, 1000, 1250, 1500, 2000). Dots represent the median time of 5 iterations of the simulation, and shading represents maximum and minimum values.
  • Figure 5: Mean and maximum error on a logarithmic scale for the FFT normalization method. Once again, both $r$: (25, 35, 50, 100), and $n$: (500, 750, 1000, 1250, 1500, 2000) are varied.
  • ...and 2 more figures