Table of Contents
Fetching ...

Smooth Sailing: Lipschitz-Driven Uncertainty Quantification for Spatial Association

David R. Burt, Renato Berlinghieri, Stephen Bates, Tamara Broderick

TL;DR

The paper tackles uncertainty quantification for spatial covariate–response associations in settings with misspecification and nonrandom target locations. It introduces Lipschitz-Driven Inference, which leverages a Lipschitz-smooth mean function in space and a Wasserstein-based bias bound to construct valid confidence intervals for the target-conditional OLS coefficients. The method achieves nominal coverage in finite samples when the noise variance $\sigma^2$ is known and remains asymptotically valid when $\sigma^2$ is unknown, outperforming several baselines in simulations and a tree-cover real-data study. This approach enables robust, interpretable inference for spatial associations without requiring covariate overlap or perfectly specified models, with practical guidance on selecting the Lipschitz constant $L$.

Abstract

Estimating associations between spatial covariates and responses - rather than merely predicting responses - is central to environmental science, epidemiology, and economics. For instance, public health officials might be interested in whether air pollution has a strictly positive association with a health outcome, and the magnitude of any effect. Standard machine learning methods often provide accurate predictions but offer limited insight into covariate-response relationships. And we show that existing methods for constructing confidence (or credible) intervals for associations can fail to provide nominal coverage in the face of model misspecification and nonrandom locations - despite both being essentially always present in spatial problems. We introduce a method that constructs valid frequentist confidence intervals for associations in spatial settings. Our method requires minimal assumptions beyond a form of spatial smoothness and a homoskedastic Gaussian error assumption. In particular, we do not require model correctness or covariate overlap between training and target locations. Our approach is the first to guarantee nominal coverage in this setting and outperforms existing techniques in both real and simulated experiments. Our confidence intervals are valid in finite samples when the noise of the Gaussian error is known, and we provide an asymptotically consistent estimation procedure for this noise variance when it is unknown.

Smooth Sailing: Lipschitz-Driven Uncertainty Quantification for Spatial Association

TL;DR

The paper tackles uncertainty quantification for spatial covariate–response associations in settings with misspecification and nonrandom target locations. It introduces Lipschitz-Driven Inference, which leverages a Lipschitz-smooth mean function in space and a Wasserstein-based bias bound to construct valid confidence intervals for the target-conditional OLS coefficients. The method achieves nominal coverage in finite samples when the noise variance is known and remains asymptotically valid when is unknown, outperforming several baselines in simulations and a tree-cover real-data study. This approach enables robust, interpretable inference for spatial associations without requiring covariate overlap or perfectly specified models, with practical guidance on selecting the Lipschitz constant .

Abstract

Estimating associations between spatial covariates and responses - rather than merely predicting responses - is central to environmental science, epidemiology, and economics. For instance, public health officials might be interested in whether air pollution has a strictly positive association with a health outcome, and the magnitude of any effect. Standard machine learning methods often provide accurate predictions but offer limited insight into covariate-response relationships. And we show that existing methods for constructing confidence (or credible) intervals for associations can fail to provide nominal coverage in the face of model misspecification and nonrandom locations - despite both being essentially always present in spatial problems. We introduce a method that constructs valid frequentist confidence intervals for associations in spatial settings. Our method requires minimal assumptions beyond a form of spatial smoothness and a homoskedastic Gaussian error assumption. In particular, we do not require model correctness or covariate overlap between training and target locations. Our approach is the first to guarantee nominal coverage in this setting and outperforms existing techniques in both real and simulated experiments. Our confidence intervals are valid in finite samples when the noise of the Gaussian error is known, and we provide an asymptotically consistent estimation procedure for this noise variance when it is unknown.

Paper Structure

This paper contains 36 sections, 6 theorems, 76 equations, 14 figures, 1 algorithm.

Key Result

Lemma 6

Let $b \in [-B, B]$, $\tilde{c} > 0$, and $\alpha \in (0,1)$. Then the narrowest $1-\alpha$ confidence interval that is symmetric and valid for all $\mathcal{N}(b, \tilde{c}^2)$ is of the form $[-B-\tilde{c}\Delta, B+\tilde{c}\Delta]$ where $\Delta$ is the solution of $\Phi\left(\Delta\right) - \Phi

Figures (14)

  • Figure 1: Coverages (left) and confidence interval widths (right) for our method as well as 5 other methods (3 methods in the lower experiment). In the upper experiment, our method and GP BCIs consistently achieve the nominal coverage (95%); the GP BCIs line (dashed blue) overlaps with ours (solid black) for most shifts. Of the two methods with correct coverage, our method yields much narrower intervals. In the lower experiment, only our method achieves the nominal coverage. The shaded region for coverage is a (conservative) 95% confidence interval while the shaded region for CI width is $\pm 2$ standard deviations; for more detail, see \ref{['app:reported-uncertainty-simulation']}.
  • Figure 2: Left: the confidence interval width of our method as a function of shift for each Lipschitz constant $L$. All $L$ yield coverage of $1.0$. Middle and right: the confidence interval width (solid line with dot marker) as a function of the Lipschitz constant for $\mathrm{shift}=0$ (middle) and $\mathrm{shift}=0.8$ (right). The vertical axis is shared across all three plots. The bias contribution to the width (dashed line, x marker) is monotonically increasing in $L$. The randomness contribution (dashed line, square marker) is monotonically decreasing.
  • Figure 3: Coverages (upper) and confidence interval widths (lower) for our method as well as 5 other methods. Each column represents a parameter in the tree cover experiment. Only our method consistently achieves the nominal coverage.
  • Figure 4: Spatial sites for the source (blue) and target (orange) data are shown in the left most plots for different values of shift used in generating the data. More extreme values of the shift parameter lead to larger biases in parameter estimation from the training data without adjustment. The third plot from the left shows the covariate surface, while the fourth shows the expected response at each spatial location.
  • Figure 5: The first 3 plots from the left show the covariate surfaces, while the fourth shows the expected response at each spatial location for the second simulated experiment. The source and target locations (not shown) are the same as in \ref{['fig:two-dim-shift-data']}, though with $N=10{,}000$.
  • ...and 9 more figures

Theorems & Definitions (15)

  • Definition 5: Nearest-Neighbor Weight Matrix
  • Lemma 6
  • Proposition 8
  • Corollary 9
  • Definition 10: Nearest-Neighbor Weight Matrix
  • Theorem 11
  • proof
  • Proposition 12
  • proof
  • proof : Proof of \ref{['lem:shortest-ci']}
  • ...and 5 more