Table of Contents
Fetching ...

Bridging Theory and Practice in Efficient Gaussian Process-Based Statistical Modeling for Large Datasets

Flávio B. Gonçalves, Marcos O. Prates, Gareth O. Roberts

Abstract

Geostatistics is a branch of statistics concerned with stochastic processes over continuous domains, with Gaussian processes (GPs) providing a flexible and principled modelling framework. However, the high computational cost of simulating or computing likelihoods with GPs limits their scalability to large datasets. This paper introduces the piecewise continuous Gaussian process (PCGP), a new process that retains the rich probabilistic structure of traditional GPs while offering substantial computational efficiency. As will be shown and discussed, existing scalable approaches that define stochastic processes on continuous domains -- such as the nearest-neighbour GP (NNGP) and the radial-neighbour GP (RNGP) -- rely on conditional independence structures that effectively constrain the measurable space on which the processes are defined, which may induce undesirable probabilistic behaviour and compromise their practical applicability, particularly in complex latent GP models. The PCGP mitigates these limitations and provides a theoretically grounded and computationally efficient alternative, as demonstrated through numerical illustrations.

Bridging Theory and Practice in Efficient Gaussian Process-Based Statistical Modeling for Large Datasets

Abstract

Geostatistics is a branch of statistics concerned with stochastic processes over continuous domains, with Gaussian processes (GPs) providing a flexible and principled modelling framework. However, the high computational cost of simulating or computing likelihoods with GPs limits their scalability to large datasets. This paper introduces the piecewise continuous Gaussian process (PCGP), a new process that retains the rich probabilistic structure of traditional GPs while offering substantial computational efficiency. As will be shown and discussed, existing scalable approaches that define stochastic processes on continuous domains -- such as the nearest-neighbour GP (NNGP) and the radial-neighbour GP (RNGP) -- rely on conditional independence structures that effectively constrain the measurable space on which the processes are defined, which may induce undesirable probabilistic behaviour and compromise their practical applicability, particularly in complex latent GP models. The PCGP mitigates these limitations and provides a theoretically grounded and computationally efficient alternative, as demonstrated through numerical illustrations.
Paper Structure (11 sections, 5 theorems, 19 equations, 7 figures, 5 tables)

This paper contains 11 sections, 5 theorems, 19 equations, 7 figures, 5 tables.

Key Result

Theorem 2.1

Highly irregular paths. Let $D_0$ be any countable set dense in $\mathcal{D}\setminus\mathcal{S}$, and let $\mathfrak D$ denote the collection of open hypercubes contained in $\mathcal{D}$ with rational endpoints. Then, with probability one, for every $\mathcal{D}^*\in\mathfrak D$ the set $\{Z_u:u\i

Figures (7)

  • Figure 1: Exact (solid) and NNGP empirical densities for the path integral (left) and maximum (right). Values of $n$ are $3$ (dashed), $9$ (dotted), and $14$ (dot-dashed).
  • Figure 2: Heat map of one realisation of the process at a regular grid of 160,000 locations. NNGP on the left and PCGP on the right.
  • Figure 3: Heat map of one realisation of the process at a regular grid of 160,000 locations using $k=12\times12$ squares. PCGP on the left and mPCGP on the right.
  • Figure S.4: Examples of grids for the mPCGP with $G=2$ (left) and $G=4$ (right). Each colour represents one partition.
  • Figure S.5: Heat map of one realisation of the process at a regular grid of 160,000 locations for values $r=200$ and $m=5$. NNGP on the left and PCGP on the right.
  • ...and 2 more figures

Theorems & Definitions (9)

  • Theorem 2.1
  • Theorem 2.2
  • Theorem 2.3
  • Theorem 3.1: Existence of the PCGP
  • Theorem 1
  • proof : Proof of Theorem \ref{['RPT']}
  • proof : Proof of Theorem \ref{['mMT']}
  • proof : Proof of Theorem \ref{['CPS']}
  • proof : Proof of Theorem \ref{['Exst_PCGP']}