Table of Contents
Fetching ...

A parsimonious, computationally efficient machine learning method for spatial regression

Milan Žukovič, Dionissios T. Hristopulos

TL;DR

The paper introduces the modified planar rotator for scattered data (MPRS), a non-parametric, physics-inspired approach that encodes spatial/temporal dependence through short-range interactions and a Boltzmann–Gibbs framework. Predictions are obtained via equilibrium conditional Monte Carlo (restricted Metropolis) updates of spin-like variables mapped from the data, enabling autonomous learning without strict distributional assumptions. Across synthetic and real 1D, 2D, and 3D data, MPRS achieves competitive accuracy with established methods like OK and IDW, while delivering superior computational scalability, particularly for large datasets and non-Gaussian contexts such as daily precipitation. The method is autonomous, scalable, and extendable to anisotropy or external fields, making it suitable for massive geospatial and temporal data analyses in environments where traditional kriging becomes impractical.

Abstract

We introduce the modified planar rotator method (MPRS), a physically inspired machine learning method for spatial/temporal regression. MPRS is a non-parametric model which incorporates spatial or temporal correlations via short-range, distance-dependent ``interactions'' without assuming a specific form for the underlying probability distribution. Predictions are obtained by means of a fully autonomous learning algorithm which employs equilibrium conditional Monte Carlo simulations. MPRS is able to handle scattered data and arbitrary spatial dimensions. We report tests on various synthetic and real-word data in one, two and three dimensions which demonstrate that the MPRS prediction performance (without parameter tuning) is competitive with standard interpolation methods such as ordinary kriging and inverse distance weighting. In particular, MPRS is a particularly effective gap-filling method for rough and non-Gaussian data (e.g., daily precipitation time series). MPRS shows superior computational efficiency and scalability for large samples. Massive data sets involving millions of nodes can be processed in a few seconds on a standard personal computer.

A parsimonious, computationally efficient machine learning method for spatial regression

TL;DR

The paper introduces the modified planar rotator for scattered data (MPRS), a non-parametric, physics-inspired approach that encodes spatial/temporal dependence through short-range interactions and a Boltzmann–Gibbs framework. Predictions are obtained via equilibrium conditional Monte Carlo (restricted Metropolis) updates of spin-like variables mapped from the data, enabling autonomous learning without strict distributional assumptions. Across synthetic and real 1D, 2D, and 3D data, MPRS achieves competitive accuracy with established methods like OK and IDW, while delivering superior computational scalability, particularly for large datasets and non-Gaussian contexts such as daily precipitation. The method is autonomous, scalable, and extendable to anisotropy or external fields, making it suitable for massive geospatial and temporal data analyses in environments where traditional kriging becomes impractical.

Abstract

We introduce the modified planar rotator method (MPRS), a physically inspired machine learning method for spatial/temporal regression. MPRS is a non-parametric model which incorporates spatial or temporal correlations via short-range, distance-dependent ``interactions'' without assuming a specific form for the underlying probability distribution. Predictions are obtained by means of a fully autonomous learning algorithm which employs equilibrium conditional Monte Carlo simulations. MPRS is able to handle scattered data and arbitrary spatial dimensions. We report tests on various synthetic and real-word data in one, two and three dimensions which demonstrate that the MPRS prediction performance (without parameter tuning) is competitive with standard interpolation methods such as ordinary kriging and inverse distance weighting. In particular, MPRS is a particularly effective gap-filling method for rough and non-Gaussian data (e.g., daily precipitation time series). MPRS shows superior computational efficiency and scalability for large samples. Massive data sets involving millions of nodes can be processed in a few seconds on a standard personal computer.
Paper Structure (16 sections, 11 equations, 15 figures, 7 tables, 2 algorithms)

This paper contains 16 sections, 11 equations, 15 figures, 7 tables, 2 algorithms.

Figures (15)

  • Figure 1: Schematic illustration of the interactions of $i$th prediction point with (a) its four nearest neighbors (including sampling and prediction points) via the constant interaction parameter $J$ in MPR and (b) its $n_b=8$ nearest neighbor (only sampling) points via the mutual distance-dependent interaction parameter $J_{i,j}$ in MPRS. Blue open and red filled circles denote sampling and prediction points, respectively, and the solid lines represent the bonds.
  • Figure 2: Energy evolution curves starting from random (red dashed curve) and nearest-neighbor interpolation (blue solid curve) states. The simulations are performed on Gaussian synthetic data with $m= 150$, $\sigma=25$ and Whittle-Matérn covariance model WM($\kappa=0.2,\nu=0.5$), sampled at $346,030$ and predicted at $702,546$ scattered points (non-coinciding with the sampling points) inside a square domain of length $L=1,024$. The inset shows a detailed view focusing on the nonequilibrium (relaxation) regime.
  • Figure 3: Dependence of MPRS and OK validation measures on the ratio of training points $tr$. The measures are calculated from 100 realizations of a Gaussian random field with $m= 150$, $\sigma=25$ and covariance model WM($\kappa=0.2,\nu=0.5$); the field is sampled at $1,000$ scattered points inside a square domain of length $L=50$.
  • Figure 4: Dependence of the ratios of MPRS and OK validation measures on the smoothness parameter. The measures are calculated based on 100 realizations of a Gaussian random field with $m= 150$, $\sigma=25$ and the covariance model WM($\kappa=0.2,\nu$), sampled at $1,000$ scattered points inside a square domain of length $L=50$. Panels (a) and (b) show the results for $tr=0.33$ and $0.66$, respectively.
  • Figure 5: Dependence of the ratios of MPRS and OK validation measures on the random field skewness (controlled by $\sigma$). The measures are calculated from 100 realizations of a lognormal random field with $m= 0$ and covariance model WM($\kappa=0.2,\nu=0.5$), sampled at $1,000$ scattered points inside a square domain of length $L=50$, for $tr=0.33$. The inset shows, as an example, the data distribution for $\sigma=1$.
  • ...and 10 more figures