Fast Uncertainty Quantification for Kernel-Based Estimators in Large-Scale Causal Inference

Matthew Kosko; Falco J; Bargagli-Stoffi; Lin Wang; Michele Santacatterina

Fast Uncertainty Quantification for Kernel-Based Estimators in Large-Scale Causal Inference

Matthew Kosko, Falco J, Bargagli-Stoffi, Lin Wang, Michele Santacatterina

Abstract

Kernel methods are widely used in causal inference for tasks such as treatment effect estimation, policy evaluation, and policy learning. The bootstrap is a standard tool for uncertainty quantification because of its broad applicability. As increasingly large datasets become available, such as the 2023 U.S. Natality data from the National Vital Statistics System (NVSS), which includes 3,596,017 registered births, the computational demands of these methods increase substantially. Kernel methods are known to scale poorly with sample size, and this limitation is further exacerbated by the repeated re-fitting required by the bootstrap. As a result, bootstrap-based inference for kernel-based estimators can become computationally infeasible in large-scale settings. In this paper, we address these challenges by extending the causal Bag of Little Bootstraps (cBLB) algorithm to kernel methods. Our approach achieves computational scalability by combining subsampling and resampling while preserving first-order uncertainty quantification and asymptotically correct coverage. We evaluate the method across three representative implementations: kernelized augmented outcome-weighted learning, kernel-based minimax weighting, and double machine learning with kernel support vector machines. We show in simulations that our method yields confidence intervals with nominal coverage at a fraction of the computational cost. We further demonstrate its utility in a real-world application by estimating the effect of any amount of smoking on birth weight, as well as the optimal treatment regime, using the NVSS dataset, where the standard bootstrap is prohibitively expensive computationally and effectively infeasible at this scale.

Fast Uncertainty Quantification for Kernel-Based Estimators in Large-Scale Causal Inference

Abstract

Paper Structure (25 sections, 1 theorem, 22 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 25 sections, 1 theorem, 22 equations, 7 figures, 3 tables, 1 algorithm.

Introduction
Contribution
Smoking and Birthweight in the NVSS: A Setting Where the Standard Bootstrap is Computationally Prohibitive
Related Work
Setup: Kernel-based Estimators for Treatment Effect Estimation and Policy Learning
Kernelized Augmented Outcome-Weighted Learning
Kernel Minimax Weights
Double Machine Learning with Support Vector Machines
Causal Bag of Little Bootstraps
Properties
Computational complexity
Simulation
Results
Practical Considerations
Size of the subsets $b$
...and 10 more sections

Key Result

theorem 1

Fix a subset $I_k=\{i_1,\ldots,i_b\}$ of size $b$ and fitted objects computed on $\{Z_i:i\in I_k\}$, yielding contributions $\{\hat{\theta}_{i,k}: i\in I_k\}$ and the subset estimator Let $M=(M_1,\ldots,M_b)\sim\mathrm{Multinomial}(n;1/b,\ldots,1/b)$ and define the cBLB replicate Assume: (i) the corresponding full-sample estimator admits an influence-function expansion with influence function $\

Figures (7)

Figure 1: Confidence intervals for the optimal value from 1000 replications from the cBLB algorithm, Kernelized AOL)
Figure 2: Confidence intervals for the ATE from 1000 replications from the cBLB algorithm, Kernel Minimax Weights)
Figure 3: Confidence intervals for the ATE from 1000 replications from the cBLB algorithm, DML using SVM and cross-fitting)
Figure 4: Timing results from 25 replications of the cBLB algorithm ($n = 5000$) for Kernelized AOL
Figure 5: Timing results from 25 replications of the cBLB algorithm ($n = 5000$) for Kernel Minimax Weights
...and 2 more figures

Theorems & Definitions (1)

theorem 1: First-order validity of cBLB (no refit)

Fast Uncertainty Quantification for Kernel-Based Estimators in Large-Scale Causal Inference

Abstract

Fast Uncertainty Quantification for Kernel-Based Estimators in Large-Scale Causal Inference

Authors

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (1)