Spatial Confounding: A review of concepts, challenges, and current approaches

Isaque Vieira Machado Pim; Luiz Max Fagundes de Carvalho; Marcos Oliveira Prates

Spatial Confounding: A review of concepts, challenges, and current approaches

Isaque Vieira Machado Pim, Luiz Max Fagundes de Carvalho, Marcos Oliveira Prates

TL;DR

This review addresses spatial confounding across areal and geostatistical data, clarifying definitions, estimands, and the bias–variance trade-offs of leading methods. It unifies approaches from spatial statistics and causal inference, including Restricted Spatial Regression, Spatial+, spectral adjustments, and joint Gaussian constructions, and provides a comprehensive head-to-head empirical comparison on real datasets. The analytical framework links smoothing to omitted-variable bias, offering insight into when and why existing methods succeed or fail, and highlights issues such as Type-S errors and coverage. The work culminates in practical recommendations, emphasizing context-dependent method choice, scale considerations, and uncertainty propagation, while outlining avenues for future research in spatio-temporal confounding and benchmarking. The synthesis advances the goal of reliable causal inference in spatial settings by mapping methodological choices to data-generating processes and highlighting where further methodological and computational development is needed.

Abstract

Spatial confounding is a persistent challenge in spatial statistics, influencing the validity of statistical inference in models that analyze spatially-structured data. The concept has been interpreted in various ways but is broadly defined as bias in estimates arising from unmeasured spatial variation. In this paper we review definitions, classical spatial models, and recent methodological advances, including approaches from spatial statistics and causal inference. We provide an unified view of the many available approaches for areal as well as geostatistical data and discuss their relative merits both theoretically and empirically with a head-to-head comparison on real datasets. Finally, we leverage the results of the empirical comparisons to discuss directions for future research.

Spatial Confounding: A review of concepts, challenges, and current approaches

TL;DR

Abstract

Paper Structure (28 sections, 2 theorems, 18 equations, 4 figures)

This paper contains 28 sections, 2 theorems, 18 equations, 4 figures.

Introduction
Background and notation
Early work on Spatial Statistics
Spatial filtering methods
Restricted Spatial Regressions
Restricted Spatial Regressions on areal data
Restricted Spatial Regressions on Geostatistical data
Transformed Gaussian Markov Random Fields
Consequences of orthogonal projection of spatial effects
Scale of Confusion
Spatial Basis Adjustment and Regularization
Spatial+
Spectral Adjustment
Correlating Gaussian Random Fields
An analytical framework for confounding
...and 13 more sections

Key Result

Lemma 6.1

Let$\alpha_1 \leq \dots \leq \alpha_p$be the eigenvalues of the penalty matrix$\mathbf{S}$and$\lambda > 0$the smoothing parameter. Then the eigenvalues of the precision matrix$\mathbf{\Sigma}^{-1}$are given by$\{\sigma^{-2}, \sigma^{-2} w_1, \dots, \sigma^{-2} w_p\}$, where$w_i = \lambda \alpha_i /

Figures (4)

Figure 1: Estimated effects across all methods. Left: Scotland lip cancer (AFF). Right: Slovenia stomach cancer (socio-economic status). frequentist methods are shown as point estimates with 95% confidence intervals and Bayesian methods as posterior means with 95% credible intervals.
Figure 2: Estimated effects across all methods. Left: Pennsylvania lung cancer (smoking prevalence). Right: Dowry deaths in Uttar Pradesh (key socio-economic covariate). frequentist methods are shown as point estimates with 95% confidence intervals and Bayesian methods as posterior means with 95% credible intervals.
Figure 3: Forestry data. Estimated effects of (left) tree age and (right) May minimum temperature across all methods. Frequentist methods are shown as point estimates with 95% confidence intervals and Bayesian methods as posterior means with 95% credible intervals.
Figure 4: Malaria in Gambia. Estimated effects of (left) vegetation greenness and (right) mosquito net usage across all methods. frequentist methods are shown as point estimates with 95% confidence intervals and Bayesian methods as posterior means with 95% credible intervals.

Theorems & Definitions (2)

Lemma 6.1
Proposition 1

Spatial Confounding: A review of concepts, challenges, and current approaches

TL;DR

Abstract

Spatial Confounding: A review of concepts, challenges, and current approaches

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (2)