Predicting Long Term Sequential Policy Value Using Softer Surrogates

Hyunji Nam; Allen Nie; Ge Gao; Vasilis Syrgkanis; Emma Brunskill

Predicting Long Term Sequential Policy Value Using Softer Surrogates

Hyunji Nam, Allen Nie, Ge Gao, Vasilis Syrgkanis, Emma Brunskill

TL;DR

This work tackles predicting the long-term value of a sequential policy when new actions are introduced and full horizon data are unavailable. It introduces soft surrogates and the dynamic invariance assumption to connect short-horizon on-policy data with long-horizon off-policy data, enabling estimation of $V^{\pi_e}$ from limited observations. The authors propose regression-based soft surrogate estimators and their doubly robust extensions, with finite-sample guarantees under covariate shift and cross-fitting. Empirical results in HIV treatment and Sepsis ICU simulations show accurate predictions using as little as $10\%$ of the full horizon, outperforming several baselines and offering significant practical value for high-stakes domains like healthcare. The work thus provides a principled way to evaluate novel sequential policies without lengthy trials, with broad implications for education and online systems alike.

Abstract

Off-policy policy evaluation (OPE) estimates the outcome of a new policy using historical data collected from a different policy. However, existing OPE methods cannot handle cases when the new policy introduces novel actions. This issue commonly occurs in real-world domains, like healthcare, as new drugs and treatments are continuously developed. Novel actions necessitate on-policy data collection, which can be burdensome and expensive if the outcome of interest takes a substantial amount of time to observe--for example, in multi-year clinical trials. This raises a key question of how to predict the long-term outcome of a policy after only observing its short-term effects? Though in general this problem is intractable, under some surrogacy conditions, the short-term on-policy data can be combined with the long-term historical data to make accurate predictions about the new policy's long-term value. In two simulated healthcare examples--HIV and sepsis management--we show that our estimators can provide accurate predictions about the policy value only after observing 10\% of the full horizon data. We also provide finite sample analysis of our doubly robust estimators.

Predicting Long Term Sequential Policy Value Using Softer Surrogates

TL;DR

from limited observations. The authors propose regression-based soft surrogate estimators and their doubly robust extensions, with finite-sample guarantees under covariate shift and cross-fitting. Empirical results in HIV treatment and Sepsis ICU simulations show accurate predictions using as little as

of the full horizon, outperforming several baselines and offering significant practical value for high-stakes domains like healthcare. The work thus provides a principled way to evaluate novel sequential policies without lengthy trials, with broad implications for education and online systems alike.

Abstract

Paper Structure (45 sections, 9 theorems, 50 equations, 1 figure, 8 tables, 1 algorithm)

This paper contains 45 sections, 9 theorems, 50 equations, 1 figure, 8 tables, 1 algorithm.

INTRODUCTION
RELATED WORK
Problem Setting & Notation
Assumption
Estimators
Soft surrogate estimator
Doubly robust soft surrogate estimator
THEORY
Experiments
The Robustness of the Proposed Estimator
Experiments on Clinical Domains
HIV treatment
Sepsis management in ICU
Baselines
Results
...and 30 more sections

Key Result

Theorem 1

Assume that $|G|, |f^{(k)}(\tau)|, |f(\tau)|$ are asymptotically bounded by $C_1H$ and $\hat{h}^{(k)}(\tau), h(\tau)$ are asymptotically bounded by $C_2$, where $C_1$ and $C_2$ are constants. Under Assumptions assmp:surrogacy and assmp:coverage, w.p. at least $1-\delta$:

Figures (1)

Figure 1: We are interested in the task of predicting the patient's long-term outcome only after a short observation window. The standard surrogacy assumption athey2019 fails due to the red trajectory where later decisions cause a very different outcome than other trajectories with the same initial $h$ observations. In this work we leverage a narrower definition of surrogacy similar to battochi2021, and show it enables us to perform effective policy evaluation.

Theorems & Definitions (14)

Theorem 1: Variance-Based Rate for DR
Corollary 1
Theorem 2
Corollary 2
Theorem 3: Doubly Robust Bias Bound
proof
Theorem 4: Variance-Based Rate for DR
proof
Theorem 5
proof
...and 4 more

Predicting Long Term Sequential Policy Value Using Softer Surrogates

TL;DR

Abstract

Predicting Long Term Sequential Policy Value Using Softer Surrogates

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (14)