Table of Contents
Fetching ...

A Convex Framework for Confounding Robust Inference

Kei Ishikawa, Niao He, Takafumi Kanamori

TL;DR

This work addresses robust policy evaluation for offline contextual bandits under unobserved confounding by proposing a convex framework that yields a sharp lower bound on policy value. The core idea is to reformulate the infinite-dimensional conditional moment constraints as tractable convex problems via weight reparameterization and a kernel-based low-rank approximation (KCMC), with strong duality allowing an empirical risk minimization interpretation. The framework supports extensions to f-divergence-based uncertainty, model selection via cross-validation or information criteria, and robust policy learning, while guaranteeing consistency and asymptotic normality of both evaluation and learning procedures. Empirical results demonstrate tighter bounds and effective policy learning across discrete and continuous actions, validating the practical impact of this convex, kernel-based approach for confounding-robust inference.

Abstract

We study policy evaluation of offline contextual bandits subject to unobserved confounders. Sensitivity analysis methods are commonly used to estimate the policy value under the worst-case confounding over a given uncertainty set. However, existing work often resorts to some coarse relaxation of the uncertainty set for the sake of tractability, leading to overly conservative estimation of the policy value. In this paper, we propose a general estimator that provides a sharp lower bound of the policy value using convex programming. The generality of our estimator enables various extensions such as sensitivity analysis with f-divergence, model selection with cross validation and information criterion, and robust policy learning with the sharp lower bound. Furthermore, our estimation method can be reformulated as an empirical risk minimization problem thanks to the strong duality, which enables us to provide strong theoretical guarantees of the proposed estimator using techniques of the M-estimation.

A Convex Framework for Confounding Robust Inference

TL;DR

This work addresses robust policy evaluation for offline contextual bandits under unobserved confounding by proposing a convex framework that yields a sharp lower bound on policy value. The core idea is to reformulate the infinite-dimensional conditional moment constraints as tractable convex problems via weight reparameterization and a kernel-based low-rank approximation (KCMC), with strong duality allowing an empirical risk minimization interpretation. The framework supports extensions to f-divergence-based uncertainty, model selection via cross-validation or information criteria, and robust policy learning, while guaranteeing consistency and asymptotic normality of both evaluation and learning procedures. Empirical results demonstrate tighter bounds and effective policy learning across discrete and continuous actions, validating the practical impact of this convex, kernel-based approach for confounding-robust inference.

Abstract

We study policy evaluation of offline contextual bandits subject to unobserved confounders. Sensitivity analysis methods are commonly used to estimate the policy value under the worst-case confounding over a given uncertainty set. However, existing work often resorts to some coarse relaxation of the uncertainty set for the sake of tractability, leading to overly conservative estimation of the policy value. In this paper, we propose a general estimator that provides a sharp lower bound of the policy value using convex programming. The generality of our estimator enables various extensions such as sensitivity analysis with f-divergence, model selection with cross validation and information criterion, and robust policy learning with the sharp lower bound. Furthermore, our estimation method can be reformulated as an empirical risk minimization problem thanks to the strong duality, which enables us to provide strong theoretical guarantees of the proposed estimator using techniques of the M-estimation.
Paper Structure (39 sections, 20 theorems, 134 equations, 8 figures, 1 table)

This paper contains 39 sections, 20 theorems, 134 equations, 8 figures, 1 table.

Key Result

Lemma 2

Let $w^*_\mathrm{CMC}$, $w^*_\mathrm{KCMC}$, and $\hat{w}_\mathrm{KCMC}$ be defined as in eq:w_solutions. Then, there exist function $\eta_\mathrm{CMC}:\mathcal{T}\times\mathcal{X}\to\mathbb{R}$, vectors $\eta_\mathrm{KCMC}, \eta_\mathrm{KCMC}'\in\mathbb{R}^D$, and constants $\eta_f, \eta_f', \eta_f

Figures (8)

  • Figure 1: Graphical model of unconfounded and confounded contextual bandit
  • Figure 2: Estimated upper and lower bounds of policy value using different sensitivity parameter $\Gamma$ for the synthetic data of sample size 1000 with binary action space.
  • Figure 3: Estimated upper and lower bounds of policy value using different sensitivity parameters. Synthetic data of sample size 1000 with continuous action space (left) and the NLS data of sample size 668 with a binary action space (right) are used.
  • Figure 4: Estimated upper and lower bounds of policy value with 95% confidence interval with (right) and without (left) second-order correction. Solid lines and the bands surrounding them indicate the point estimate (both without second order correction) and its confidence interval.
  • Figure 5: Acceptance rate of the null hypothesis under different null hypothesises with Tan's box constraints ($\Gamma=1.5$) with significance level $\alpha=0.05$. The plot on the left is in the original size and the one on the right is its zoomed version.
  • ...and 3 more figures

Theorems & Definitions (33)

  • Example 1: Box constraints
  • Example 2: f-divergence constraint
  • Remark 1: A comparison to ZSB sensitivity model
  • Lemma 2: Characterization of solutions
  • Example 3: Solutions for box constraints
  • Theorem 3
  • Example 4: Lipschitz constant for box constraints
  • Example 5: Lipschitz constant for bounded conditional f-constraint
  • Lemma 4: Convergence of $\|\Pi_{\boldsymbol{\psi}} \eta^*_\mathrm{CMC} - \eta^*_\mathrm{CMC}\|$ with kernel PCA
  • Example 6: Derivation of quantile balancing estimator dorn2022sharp
  • ...and 23 more