Table of Contents
Fetching ...

Markov Random Fields with Proximity Constraints for Spatial Data

Sudipto Saha, Jonathan R. Bradley

Abstract

The conditional autoregressive (CAR) model, simultaneous autoregressive (SAR) model, and its variants have become the predominant strategies for modeling regional or areal-referenced spatial data. The overwhelming wide-use of the CAR/SAR model motivates the need for new classes of models for areal-referenced data. Thus, we develop a novel class of Markov random fields based on truncating the full-conditional distribution. We define this truncation in two ways leading to versions of what we call the truncated autoregressive (TAR) model. First, we truncate the full conditional distribution so that a response at one location is close to the average of its neighbors. This strategy establishes relationships between TAR and CAR. Second, we truncate on the joint distribution of the data process in a similar way. This specification leads to connection between TAR and SAR model. Our Bayesian implementation does not use Markov chain Monte Carlo (MCMC) for Bayesian computation, and generates samples directly from the posterior distribution. Moreover, TAR does not have a range parameter that arises in the CAR/SAR models, which can be difficult to learn. We present the results of the proposed truncated autoregressive model on several simulated datasets and on a dataset of average property prices.

Markov Random Fields with Proximity Constraints for Spatial Data

Abstract

The conditional autoregressive (CAR) model, simultaneous autoregressive (SAR) model, and its variants have become the predominant strategies for modeling regional or areal-referenced spatial data. The overwhelming wide-use of the CAR/SAR model motivates the need for new classes of models for areal-referenced data. Thus, we develop a novel class of Markov random fields based on truncating the full-conditional distribution. We define this truncation in two ways leading to versions of what we call the truncated autoregressive (TAR) model. First, we truncate the full conditional distribution so that a response at one location is close to the average of its neighbors. This strategy establishes relationships between TAR and CAR. Second, we truncate on the joint distribution of the data process in a similar way. This specification leads to connection between TAR and SAR model. Our Bayesian implementation does not use Markov chain Monte Carlo (MCMC) for Bayesian computation, and generates samples directly from the posterior distribution. Moreover, TAR does not have a range parameter that arises in the CAR/SAR models, which can be difficult to learn. We present the results of the proposed truncated autoregressive model on several simulated datasets and on a dataset of average property prices.

Paper Structure

This paper contains 24 sections, 13 theorems, 33 equations, 10 figures, 4 tables.

Key Result

Proposition 1

Upon integrating out $\{u_{i}\}$, the model in 2eq:trunc1 becomes where $\widetilde{Y}_{i}=Y_{i}-\boldsymbol{\mathbf{x}}_{i}'\boldsymbol{\mathbf{\beta}}$ for $i=1,\ldots,n$.

Figures (10)

  • Figure 3.1: The correlation matrix of $\boldsymbol{\mathbf{x}} \in \mathbb{R}^{100}$ based on 1000 replicates from \ref{['2eq:truncatedNormal']}, where $\boldsymbol{\mathbf{\mu}}=\boldsymbol{\mathbf{0}}_{100}$, $\boldsymbol{\mathbf{\Sigma}}=\boldsymbol{\mathbf{I}}_{100}$, and $S$ is based on \ref{['2eq:S']}.
  • Figure 5.1: Image plot of the simulated data based on the simulation settings mentioned in Items (a) -- (d) with $\boldsymbol{\mathbf{\beta}}=(2,5)'$, $\sigma_{Y}^{2}=0.5$, $\delta=1$ and $\rho=-0.606$.
  • Figure 5.2: We plot the density of posterior samples for $\beta_{1}$ (first row), $\beta_{2}$ (second row) and $\sigma_{Y}^{2}$ (third row) when implementing the TARC model with $\delta=1$. The vertical dashed lines represent the true values $\beta_{1}=2$, $\beta_{2}=5$ and $\sigma_{Y}^{2}=0.5$. We also plot the predictions at the missing locations vs. the true data (fourth row), where the straight line represents the 45° reference line. The plots on the left and right column are generated using the simulated datasets based on Items (a) and (b) respectively.
  • Figure 5.3: We plot the density of posterior samples for $\beta_{1}$ (first row), $\beta_{2}$ (second row) and $\sigma_{Y}^{2}$ (third row) when implementing the TARS model with $\delta=1$. The vertical dashed lines represent the true values $\beta_{1}=2$, $\beta_{2}=5$ and $\sigma_{Y}^{2}=0.5$. We also plot the predictions at the missing locations vs. the true data (fourth row), where the straight line represents the 45° reference line. The plots on the left and right column are generated using the simulated datasets based on Items (c) and (d) respectively.
  • Figure 5.4: Performance comparison between the TARC model and the CAR model when the data is simulated from the CAR model, i.e., Item (a). For each metric, we plot the Boxplot of 20 replicates.
  • ...and 5 more figures

Theorems & Definitions (13)

  • Proposition 1
  • Proposition 2
  • Corollary 2.1
  • Corollary 2.2
  • Proposition 3
  • Proposition 4
  • Corollary 4.1
  • Corollary 4.2
  • Proposition 5
  • Corollary 5.1
  • ...and 3 more