Consistent Validation for Predictive Methods in Spatial Settings

David R. Burt; Yunyi Shen; Tamara Broderick

Consistent Validation for Predictive Methods in Spatial Settings

David R. Burt, Yunyi Shen, Tamara Broderick

TL;DR

The paper tackles validating predictions in spatial settings where test sites are fixed and validation data may densely populate the space. It shows that standard holdout and covariate-shift methods can be inconsistent under infill, and formulates an infill-consistency criterion for validation methods. A novel adaptive k-nearest-neighbors risk estimator, Spatial Nearest Neighbors (SNN), is developed with a bound-based selection of the number of neighbors, and is proven to be consistent under infill asymptotics. Empirical results across synthetic grids, real weather and housing data demonstrate that SNN provides more accurate and reliable risk estimates than holdout or 1NN in both grid and point-prediction tasks, supporting its use for model selection and validation in spatial contexts. The work thereby offers a principled framework for spatial validation and a practical, scalable estimator with real-world applicability.

Abstract

Spatial prediction tasks are key to weather forecasting, studying air pollution impacts, and other scientific endeavors. Determining how much to trust predictions made by statistical or physical methods is essential for the credibility of scientific conclusions. Unfortunately, classical approaches for validation fail to handle mismatch between locations available for validation and (test) locations where we want to make predictions. This mismatch is often not an instance of covariate shift (as commonly formalized) because the validation and test locations are fixed (e.g., on a grid or at select points) rather than i.i.d. from two distributions. In the present work, we formalize a check on validation methods: that they become arbitrarily accurate as validation data becomes arbitrarily dense. We show that classical and covariate-shift methods can fail this check. We propose a method that builds from existing ideas in the covariate-shift literature, but adapts them to the validation data at hand. We prove that our proposal passes our check. And we demonstrate its advantages empirically on simulated and real data.

Consistent Validation for Predictive Methods in Spatial Settings

TL;DR

Abstract

Paper Structure (126 sections, 37 theorems, 124 equations, 17 figures, 4 tables, 1 algorithm)

This paper contains 126 sections, 37 theorems, 124 equations, 17 figures, 4 tables, 1 algorithm.

INTRODUCTION
SPATIAL TEST RISK
Test Risk of a Spatial Predictive Method
Estimating Test Risk
WE WANT CONSISTENT ESTIMATORS
CURRENT ESTIMATORS ARE INCONSISTENT
A CONSISTENT ESTIMATOR
Our Bound and Estimator
Our Method is Consistent
EXPERIMENTS
Test Risk Estimation on Synthetic Data
Temperature, Bootstrapped Residuals
Property Sales in England and Wales
Wind Speed Prediction
Temperature, Real Response
...and 111 more sections

Key Result

Proposition 3.2

Suppose that $\mathcal{S} = [0,1]^d$, $S^{\textup{val}}_n \stackrel{\textup{\tiny iid}}{\sim} P$ for $1 \leq n \leq N^{\textup{val}}$, and $P$ has Lebesgue density lower bounded by $c >0$ over ${[0,1]^d}$. Let $B_d = \pi^{d/2}/\Gamma(d/2+1)$ be the volume of the $d$-dimensional Euclidean unit ball.

Figures (17)

Figure 1: Error for test risk estimation in the grid prediction task (left) and point prediction task (right) across methods (holdout in blue, 1NN in orange, our SNN in green); lower values correspond to better performance. The vertical axis shows the absolute difference between the estimated test risk and empirical test risk. Each box plot shows the median, inter-quartile range, and outliers based on 100 synthetic datasets. The horizontal axis tracks increasing validation set sizes. Numbers above the upper box indicate the number of outliers falling above the vertical limit.
Figure 2: Signed errors in estimating the test risk for (left to right) the bootstrapped air temperature task with GWR (°C); the same task with GPR; the flat price task (£); and the wind speed task (m/s). The holdout (blue), 1NN (orange), and SNN (green) appear left to right in each plot.
Figure 3: On the left: the training (pink circles), validation (orange diamonds), and test (blue triangle) locations for the blocked spatial validation counterexample described above. On the right: A comparison of our method and the holdout estimator using the spatial block on the dataset described above.
Figure 4: On the left: the training (pink circles), validation (orange diamonds) and test (blue triangle) locations for the blocked spatial validation counterexample described above. On the right: A comparison of our method and the holdout estimator using the spatial block on the dataset described above.
Figure 5: Computational time (in seconds) for computation of the three estimates of validation error in the synthetic experiment with data at a single test point (left) and on a regular grid (right). The mean is shown with the main line, while the maximum and minimum are indicated by the vertical bars. For both datasets, our method takes significantly longer to compute than the baselines But even with the largest number of validation points used, our method takes under a minute to compute.
...and 12 more figures

Theorems & Definitions (64)

Definition 2.3
Definition 3.1: cressie2015statistics, Wendland_2004
Definition 3.2: Consistency of Test Risk Estimation Under Infill Asymptotics
Proposition 3.2: Independent and Identically Distributed Data Satisfies an Infill Assumption
Proposition 4.0: Inconsistency of
Proposition 4.0: Inconsistency of blocked spatial validation
Definition 4.1
Proposition 4.1: Inconsistency of 1NN
Proposition 4.1: Inconsistency of kNN depending on number of validation points
Theorem 5.1: Bound on Estimation Error in Terms of Fill Distance
...and 54 more

Consistent Validation for Predictive Methods in Spatial Settings

TL;DR

Abstract

Consistent Validation for Predictive Methods in Spatial Settings

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (17)

Theorems & Definitions (64)