Surrogate Graph Partitioning for Spatial Prediction
Yuta Shikuri, Hironori Fujisawa
TL;DR
The paper addresses interpretable spatial prediction by formulating a surrogate graph-partitioning problem that minimizes within-segment variance of predicted values, cast as a mixed-integer quadratic program $\min\|\bm{W}\bm{v}-\bm{\eta}\|_2^2$ with $\bm{W}\in\{0,1\}^{n\times m}$. To tackle computational intractability for large $n$, it introduces an approximation based on prior aggregation, establishing an additive guarantee $c_2=2\|\tilde{\bm{\eta}}-\bm{\eta}\|_2$ and proving that, under certain conditions, data points within a sublabel can share a common label in an optimal solution. The methodology combines Gaussian process regression with variational inference for the predictor $\eta$, a prior-aggregation step to reduce problem size, and a flow-based, MIQP-driven connected graph partitioning approach to yield spatial segments that preserve interpretability. Experimental results on California Housing and National Risk Index demonstrate that the MIQP-based segmentation achieves lower intra-group variance than baselines, while the approximation substantially improves scalability; constraints enforcing connectivity help prevent distant regions from being spuriously linked. Overall, the paper offers a practical, interpretable surrogate modeling framework for spatial prediction with provable approximation guarantees and scalable computation.
Abstract
Spatial prediction refers to the estimation of unobserved values from spatially distributed observations. Although recent advances have improved the capacity to model diverse observation types, adoption in practice remains limited in industries that demand interpretability. To mitigate this gap, surrogate models that explain black-box predictors provide a promising path toward interpretable decision making. In this study, we propose a graph partitioning problem to construct spatial segments that minimize the sum of within-segment variances of individual predictions. The assignment of data points to segments can be formulated as a mixed-integer quadratic programming problem. While this formulation potentially enables the identification of exact segments, its computational complexity becomes prohibitive as the number of data points increases. Motivated by this challenge, we develop an approximation scheme that leverages the structural properties of graph partitioning. Experimental results demonstrate the computational efficiency of this approximation in identifying spatial segments.
