A parsimonious, computationally efficient machine learning method for spatial regression

Milan Žukovič; Dionissios T. Hristopulos

A parsimonious, computationally efficient machine learning method for spatial regression

Milan Žukovič, Dionissios T. Hristopulos

TL;DR

The paper introduces the modified planar rotator for scattered data (MPRS), a non-parametric, physics-inspired approach that encodes spatial/temporal dependence through short-range interactions and a Boltzmann–Gibbs framework. Predictions are obtained via equilibrium conditional Monte Carlo (restricted Metropolis) updates of spin-like variables mapped from the data, enabling autonomous learning without strict distributional assumptions. Across synthetic and real 1D, 2D, and 3D data, MPRS achieves competitive accuracy with established methods like OK and IDW, while delivering superior computational scalability, particularly for large datasets and non-Gaussian contexts such as daily precipitation. The method is autonomous, scalable, and extendable to anisotropy or external fields, making it suitable for massive geospatial and temporal data analyses in environments where traditional kriging becomes impractical.

Abstract

We introduce the modified planar rotator method (MPRS), a physically inspired machine learning method for spatial/temporal regression. MPRS is a non-parametric model which incorporates spatial or temporal correlations via short-range, distance-dependent ``interactions'' without assuming a specific form for the underlying probability distribution. Predictions are obtained by means of a fully autonomous learning algorithm which employs equilibrium conditional Monte Carlo simulations. MPRS is able to handle scattered data and arbitrary spatial dimensions. We report tests on various synthetic and real-word data in one, two and three dimensions which demonstrate that the MPRS prediction performance (without parameter tuning) is competitive with standard interpolation methods such as ordinary kriging and inverse distance weighting. In particular, MPRS is a particularly effective gap-filling method for rough and non-Gaussian data (e.g., daily precipitation time series). MPRS shows superior computational efficiency and scalability for large samples. Massive data sets involving millions of nodes can be processed in a few seconds on a standard personal computer.

A parsimonious, computationally efficient machine learning method for spatial regression

TL;DR

Abstract

Paper Structure (16 sections, 11 equations, 15 figures, 7 tables, 2 algorithms)

This paper contains 16 sections, 11 equations, 15 figures, 7 tables, 2 algorithms.

Introduction
The MPRS Model
Model definition
Setting the MPRS Model Parameters and Hyperparameters
Learning "Data Gaps" by Means of Restricted Metropolis Monte Carlo
Study Design for Validation of MPRS Learning Method
Results
Synthetic 2D data
Real 2D spatial data
Ambient gamma dose rates
Jura data set
Walker lake data set
Atmospheric latent heat release
Time series (temperature and precipitation)
Real 3D spatial data
...and 1 more sections

Figures (15)

Figure 1: Schematic illustration of the interactions of $i$th prediction point with (a) its four nearest neighbors (including sampling and prediction points) via the constant interaction parameter $J$ in MPR and (b) its $n_b=8$ nearest neighbor (only sampling) points via the mutual distance-dependent interaction parameter $J_{i,j}$ in MPRS. Blue open and red filled circles denote sampling and prediction points, respectively, and the solid lines represent the bonds.
Figure 2: Energy evolution curves starting from random (red dashed curve) and nearest-neighbor interpolation (blue solid curve) states. The simulations are performed on Gaussian synthetic data with $m= 150$, $\sigma=25$ and Whittle-Matérn covariance model WM($\kappa=0.2,\nu=0.5$), sampled at $346,030$ and predicted at $702,546$ scattered points (non-coinciding with the sampling points) inside a square domain of length $L=1,024$. The inset shows a detailed view focusing on the nonequilibrium (relaxation) regime.
Figure 3: Dependence of MPRS and OK validation measures on the ratio of training points $tr$. The measures are calculated from 100 realizations of a Gaussian random field with $m= 150$, $\sigma=25$ and covariance model WM($\kappa=0.2,\nu=0.5$); the field is sampled at $1,000$ scattered points inside a square domain of length $L=50$.
Figure 4: Dependence of the ratios of MPRS and OK validation measures on the smoothness parameter. The measures are calculated based on 100 realizations of a Gaussian random field with $m= 150$, $\sigma=25$ and the covariance model WM($\kappa=0.2,\nu$), sampled at $1,000$ scattered points inside a square domain of length $L=50$. Panels (a) and (b) show the results for $tr=0.33$ and $0.66$, respectively.
Figure 5: Dependence of the ratios of MPRS and OK validation measures on the random field skewness (controlled by $\sigma$). The measures are calculated from 100 realizations of a lognormal random field with $m= 0$ and covariance model WM($\kappa=0.2,\nu=0.5$), sampled at $1,000$ scattered points inside a square domain of length $L=50$, for $tr=0.33$. The inset shows, as an example, the data distribution for $\sigma=1$.
...and 10 more figures

A parsimonious, computationally efficient machine learning method for spatial regression

TL;DR

Abstract

A parsimonious, computationally efficient machine learning method for spatial regression

Authors

TL;DR

Abstract

Table of Contents

Figures (15)