Table of Contents
Fetching ...

Spatially Robust Inference with Predicted and Missing at Random Labels

Stephen Salerno, Zhenke Wu, Tyler McCormick

Abstract

When outcome data are expensive or onerous to collect, scientists increasingly substitute predictions from machine learning and AI models for unlabeled cases, a process which has consequences for downstream statistical inference. While recent methods provide valid uncertainty quantification under independent sampling, real-world applications involve missing at random (MAR) labeling and spatial dependence. For inference in this setting, we propose a doubly robust estimator with cross-fit nuisances. We show that cross-fitting induces fold-level correlation that distorts spatial variance estimators, producing unstable or overly conservative confidence intervals. To address this, we propose a jackknife spatial heteroscedasticity and autocorrelation consistent (HAC) variance correction that separates spatial dependence from fold-induced noise. Under standard identification and dependence conditions, the resulting intervals are asymptotically valid. Simulations and benchmark datasets show substantial improvement in finite-sample calibration, particularly under MAR labeling and clustered sampling.

Spatially Robust Inference with Predicted and Missing at Random Labels

Abstract

When outcome data are expensive or onerous to collect, scientists increasingly substitute predictions from machine learning and AI models for unlabeled cases, a process which has consequences for downstream statistical inference. While recent methods provide valid uncertainty quantification under independent sampling, real-world applications involve missing at random (MAR) labeling and spatial dependence. For inference in this setting, we propose a doubly robust estimator with cross-fit nuisances. We show that cross-fitting induces fold-level correlation that distorts spatial variance estimators, producing unstable or overly conservative confidence intervals. To address this, we propose a jackknife spatial heteroscedasticity and autocorrelation consistent (HAC) variance correction that separates spatial dependence from fold-induced noise. Under standard identification and dependence conditions, the resulting intervals are asymptotically valid. Simulations and benchmark datasets show substantial improvement in finite-sample calibration, particularly under MAR labeling and clustered sampling.
Paper Structure (36 sections, 3 theorems, 58 equations, 13 figures, 5 tables)

This paper contains 36 sections, 3 theorems, 58 equations, 13 figures, 5 tables.

Key Result

Proposition 1

Under Assumption ass:mar, $\mathbb{E}[\psi_i(\theta_0;m,\pi_0)] = 0$ for any measurable $m$, and $\mathbb{E}[\psi_i(\theta_0;m_0,\pi)] = 0$ for any measurable $\pi$ that satisfies the same overlap bound.

Figures (13)

  • Figure 1: Synthetic spatial-field simulation (superpopulation target): boxplots of empirical 90% coverage versus dependence level $\sigma$. Panels split by missingness arm (MCAR vs MAR) and sampling regime (iid vs soft-block). For each $\sigma$, each box summarizes the distribution across 100 population replicates of empirical coverage computed from 200 repeated sample draws of size $n=600$ from a single population replicate. Methods shown are Cross-PPI, PPI++, Bootstrap-PPI, and Spatial DR-JK-HAC. The dashed line marks nominal coverage. Under MAR, baseline models under-cover even under iid sampling; under MCAR, these models under-cover when soft-block sampling makes spatial dependence visible. Spatial DR-JK-HAC stays near nominal in all four cells, at the cost of modestly wider intervals (see Appendix \ref{['app:synthwidth']}).
  • Figure 2: Synthetic simulation (superpopulation target): confidence-interval length boxplots by missingness arm and sampling regime. The $x$-axis is $\sigma\in\{0,40,80,120\}$. Each box summarizes mean interval length over 200 repeated sample draws across 100 replicate populations. Purple shades denote DR-based methods (DR-iid, spatial DR-HAC, spatial DR-JK-HAC). The figure is self-contained: the dependence-aware methods inflate width relative to iid post-prediction baselines, and spatial DR-JK-HAC narrows intervals relative to spatial DR-HAC while retaining dependence-aware coverage.
  • Figure 3: Synthetic simulation: coverage boxplots for the full method menu. Methods include Cross-PPI, PPI++ (iid), Bootstrap-PPI (Efron 2025), DR-iid, spatial DR-HAC, and spatial DR-JK-HAC. The figure is self-contained: DR-iid and spatial DR-HAC improve over iid post-prediction baselines under dependence and MAR but do not match spatial DR-JK-HAC in the most challenging soft-block MAR cells, motivating the jackknife-HAC correction in the main text.
  • Figure 4: Replicate-level interval visualization under MAR and soft-block sampling. Each interval is centered by subtracting its replicate target mean, so the dashed vertical line at $0$ is the target in every panel. Cross-PPI (blue) shows systematic mis-centering in multiple panels, consistent with the MAR under-coverage seen in Figure \ref{['fig:synth_cov_box']}. Spatial DR-JK-HAC (purple) remains centered and attains near-nominal coverage.
  • Figure 5: Replicate-level interval visualization under MCAR and soft-block sampling, with the same centering convention as Figure \ref{['fig:marcis']}. Under MCAR, iid baselines are often closer to nominal than under MAR, but Cross-PPI still under-covers in the dependence-rich synthetic panel as dependence becomes visible under soft-block sampling. Spatial DR-JK-HAC remains near nominal across panels at the cost of wider intervals.
  • ...and 8 more figures

Theorems & Definitions (9)

  • Proposition 1: Doubly Robust Identification
  • Proposition 2: Fold-Shared Noise Is Removed from Within Covariance
  • proof
  • Remark 1: Buffered Cross-Fitting
  • Remark 2: Propensity Clipping
  • Theorem 1: Asymptotic Normality and Valid CIs
  • proof
  • Remark 3: Scope of Theorem \ref{['thm:main']}
  • proof