Table of Contents
Fetching ...

Sufficient dimension reduction for regression with spatially correlated errors: application to prediction

Liliana Forzani, Rodrigo García Arancibia, Antonella Gieco, Pamela Llop, Anne Yao

TL;DR

This work develops a spatial SDR framework for predicting $Y_{oldsymbol s}$ from high-dimensional $\boldsymbol X_{oldsymbol s}$ by formulating model-based inverse regression under two spatial error structures: SSCM, where spatial dependence is captured via a cross-covariance matrix $\mathbf H$, and SEM, where dependence follows $\mathbb E = \theta \mathbf W \mathbb E + \mathbb U$. The authors derive maximum-likelihood estimators for the SDR $R(\boldsymbol X) = \boldsymbol X \boldsymbol\Delta^{-1} \mathbf A$ in both models, and implement nonparametric forward regression using reduced predictors. They propose two kernel-based prediction rules (1k and 2k) that incorporate spatial distance in the weights and validate the approach through simulations and three real datasets (Meuse zinc, Ohio school indices, and global GDP growth), showing substantial predictive gains over full-dimensional and independent SDR methods. Dimension selection is discussed via likelihood-ratio tests, information criteria, and a predictive cross-validation criterion (CV-MPE). Overall, spatial SDR with SSCM or SEM improves predictive accuracy for geostatistical and lattice data and offers practical guidance on dimension choice and distance-aware prediction.

Abstract

In this paper, we address the problem of predicting a response variable in the context of both, spatially correlated and high-dimensional data. To reduce the dimensionality of the predictor variables, we apply the sufficient dimension reduction (SDR) paradigm, which reduces the predictor space while retaining relevant information about the response. To achieve this, we impose two different spatial models on the inverse regression: the separable spatial covariance model (SSCM) and the spatial autoregressive error model (SEM). For these models, we derive maximum likelihood estimators for the reduction and use them to predict the response via nonparametric rules for forward regression. Through simulations and real data applications, we demonstrate the effectiveness of our approach for spatial data prediction.

Sufficient dimension reduction for regression with spatially correlated errors: application to prediction

TL;DR

This work develops a spatial SDR framework for predicting from high-dimensional by formulating model-based inverse regression under two spatial error structures: SSCM, where spatial dependence is captured via a cross-covariance matrix , and SEM, where dependence follows . The authors derive maximum-likelihood estimators for the SDR in both models, and implement nonparametric forward regression using reduced predictors. They propose two kernel-based prediction rules (1k and 2k) that incorporate spatial distance in the weights and validate the approach through simulations and three real datasets (Meuse zinc, Ohio school indices, and global GDP growth), showing substantial predictive gains over full-dimensional and independent SDR methods. Dimension selection is discussed via likelihood-ratio tests, information criteria, and a predictive cross-validation criterion (CV-MPE). Overall, spatial SDR with SSCM or SEM improves predictive accuracy for geostatistical and lattice data and offers practical guidance on dimension choice and distance-aware prediction.

Abstract

In this paper, we address the problem of predicting a response variable in the context of both, spatially correlated and high-dimensional data. To reduce the dimensionality of the predictor variables, we apply the sufficient dimension reduction (SDR) paradigm, which reduces the predictor space while retaining relevant information about the response. To achieve this, we impose two different spatial models on the inverse regression: the separable spatial covariance model (SSCM) and the spatial autoregressive error model (SEM). For these models, we derive maximum likelihood estimators for the reduction and use them to predict the response via nonparametric rules for forward regression. Through simulations and real data applications, we demonstrate the effectiveness of our approach for spatial data prediction.

Paper Structure

This paper contains 20 sections, 3 theorems, 64 equations, 8 figures, 2 tables.

Key Result

Theorem 2.1

If $\mathbb{X} | {\mathbf Y}$ has log-likelihood given by log-likelihood , then a sufficient reduction for the regression of ${\mathbf Y}|\mathbb{X}$ is given by where $\mathbf A$ is a base for the $\mathrm{span}\{\hbox{\boldmath $\mu$}_{{Y}_{\mathbf s_i}}-\hbox{\boldmath $\mu$}, {Y}_{\mathbf s_i} \in \mathcal{S}_Y \}$ with $\hbox{\boldmath $\mu$}=E(\hbox{\boldmath $\mu$}_{{Y}_{\mathbf s_i}})$ fo

Figures (8)

  • Figure 1: Average cross-validated MSE (computed over 100 replications) for the SSCM model using different prediction methods across various sample sizes.
  • Figure 2: Average cross-validated MSE (computed over 100 replications) for the SEM model using different prediction methods across various sample sizes.
  • Figure 3: Spatial Distribution of the Logarithm of Zinc Concentration in the Meuse River Floodplain.
  • Figure 4: Cross-validated RMSE (over $100$ replications) for the Zinc Concentration Prediction in Meuse Data Set using SDR methods.
  • Figure 5: Mean Fourth Grade Proficiency Scores in School Districs in Ohio.
  • ...and 3 more figures

Theorems & Definitions (4)

  • Theorem 2.1
  • proof
  • Proposition 3.1
  • Proposition 4.1