Consistent Validation for Predictive Methods in Spatial Settings
David R. Burt, Yunyi Shen, Tamara Broderick
TL;DR
The paper tackles validating predictions in spatial settings where test sites are fixed and validation data may densely populate the space. It shows that standard holdout and covariate-shift methods can be inconsistent under infill, and formulates an infill-consistency criterion for validation methods. A novel adaptive k-nearest-neighbors risk estimator, Spatial Nearest Neighbors (SNN), is developed with a bound-based selection of the number of neighbors, and is proven to be consistent under infill asymptotics. Empirical results across synthetic grids, real weather and housing data demonstrate that SNN provides more accurate and reliable risk estimates than holdout or 1NN in both grid and point-prediction tasks, supporting its use for model selection and validation in spatial contexts. The work thereby offers a principled framework for spatial validation and a practical, scalable estimator with real-world applicability.
Abstract
Spatial prediction tasks are key to weather forecasting, studying air pollution impacts, and other scientific endeavors. Determining how much to trust predictions made by statistical or physical methods is essential for the credibility of scientific conclusions. Unfortunately, classical approaches for validation fail to handle mismatch between locations available for validation and (test) locations where we want to make predictions. This mismatch is often not an instance of covariate shift (as commonly formalized) because the validation and test locations are fixed (e.g., on a grid or at select points) rather than i.i.d. from two distributions. In the present work, we formalize a check on validation methods: that they become arbitrarily accurate as validation data becomes arbitrarily dense. We show that classical and covariate-shift methods can fail this check. We propose a method that builds from existing ideas in the covariate-shift literature, but adapts them to the validation data at hand. We prove that our proposal passes our check. And we demonstrate its advantages empirically on simulated and real data.
