Bayesian nonparametric boundary detection for multiple areal data
Matteo Gianella, Mario Beraha, Alessandra Guglielmi
Abstract
We consider the problem of boundary detection for areal data, focusing on situations where for each areal unit multiple observations are available. We propose a Bayesian nonparametric mixture model for the area-specific population densities, with spatially dependent weights and a random number of components. Contrary to previously proposed methods for boundary detection, which consider one observation per areal unit, ours does not require external information such as area-specific covariates or dissimilarity metrics. Instead, by exploiting information from multiple samples per area, it is able to identify boundaries between areas that exhibit different densities. Crucially, the number of mixture components needs to be learned from data to obtain meaningful boundary detection, due to the non-identifiability of overfitted mixtures. Therefore, we assume it random by placing a prior on it. The motivating application is the analysis of economic inequality in the greater Los Angeles region, which typically yields social inequality and unrest. Efficient posterior computation is facilitated by a transdimensional Markov Chain Monte Carlo sampler which exploits the recently introduced \emph{optimal auxiliary priors} to improve the mixing. The methodology is validated via extensive simulations and applied to the income data in the greater Los Angeles region. We identify several boundaries in the income distributions, which can be explained \textit{ex-post} in terms of the percentage of the population without health insurance, though not in terms of the total number of crimes, showing the usefulness of such an analysis to policymakers.
