Sufficient dimension reduction for regression with spatially correlated errors: application to prediction
Liliana Forzani, Rodrigo García Arancibia, Antonella Gieco, Pamela Llop, Anne Yao
TL;DR
This work develops a spatial SDR framework for predicting $Y_{oldsymbol s}$ from high-dimensional $\boldsymbol X_{oldsymbol s}$ by formulating model-based inverse regression under two spatial error structures: SSCM, where spatial dependence is captured via a cross-covariance matrix $\mathbf H$, and SEM, where dependence follows $\mathbb E = \theta \mathbf W \mathbb E + \mathbb U$. The authors derive maximum-likelihood estimators for the SDR $R(\boldsymbol X) = \boldsymbol X \boldsymbol\Delta^{-1} \mathbf A$ in both models, and implement nonparametric forward regression using reduced predictors. They propose two kernel-based prediction rules (1k and 2k) that incorporate spatial distance in the weights and validate the approach through simulations and three real datasets (Meuse zinc, Ohio school indices, and global GDP growth), showing substantial predictive gains over full-dimensional and independent SDR methods. Dimension selection is discussed via likelihood-ratio tests, information criteria, and a predictive cross-validation criterion (CV-MPE). Overall, spatial SDR with SSCM or SEM improves predictive accuracy for geostatistical and lattice data and offers practical guidance on dimension choice and distance-aware prediction.
Abstract
In this paper, we address the problem of predicting a response variable in the context of both, spatially correlated and high-dimensional data. To reduce the dimensionality of the predictor variables, we apply the sufficient dimension reduction (SDR) paradigm, which reduces the predictor space while retaining relevant information about the response. To achieve this, we impose two different spatial models on the inverse regression: the separable spatial covariance model (SSCM) and the spatial autoregressive error model (SEM). For these models, we derive maximum likelihood estimators for the reduction and use them to predict the response via nonparametric rules for forward regression. Through simulations and real data applications, we demonstrate the effectiveness of our approach for spatial data prediction.
