Table of Contents
Fetching ...

Concept-driven Off Policy Evaluation

Ritam Majumdar, Jack Teversham, Sonali Parbhoo

TL;DR

This work broadens Off-Policy Evaluation by introducing concept-based estimators that leverage interpretable concepts from Concept Bottleneck Models to reduce variance in batch-data evaluation. It defines Concept-based IS estimators (CIS and CPDIS), proves unbiasedness under known concepts and variance reduction relative to traditional OPE, and extends to unknown concepts via an end-to-end learning framework (PC-OPE) that optimizes concise, diverse concepts and concept-to-policy mappings. The approach enables targeted interventions on concepts for deeper insights into evaluation behavior and robustness, demonstrated on WindyGridworld and MIMIC-III with substantial variance reductions and improved interpretability. Limitations include potential bias and trajectory distribution mismatch when learning concepts, motivating future work on confounding, partial observability, and broader domains.

Abstract

Evaluating off-policy decisions using batch data poses significant challenges due to limited sample sizes leading to high variance. To improve Off-Policy Evaluation (OPE), we must identify and address the sources of this variance. Recent research on Concept Bottleneck Models (CBMs) shows that using human-explainable concepts can improve predictions and provide better understanding. We propose incorporating concepts into OPE to reduce variance. Our work introduces a family of concept-based OPE estimators, proving that they remain unbiased and reduce variance when concepts are known and predefined. Since real-world applications often lack predefined concepts, we further develop an end-to-end algorithm to learn interpretable, concise, and diverse parameterized concepts optimized for variance reduction. Our experiments with synthetic and real-world datasets show that both known and learned concept-based estimators significantly improve OPE performance. Crucially, we show that, unlike other OPE methods, concept-based estimators are easily interpretable and allow for targeted interventions on specific concepts, further enhancing the quality of these estimators.

Concept-driven Off Policy Evaluation

TL;DR

This work broadens Off-Policy Evaluation by introducing concept-based estimators that leverage interpretable concepts from Concept Bottleneck Models to reduce variance in batch-data evaluation. It defines Concept-based IS estimators (CIS and CPDIS), proves unbiasedness under known concepts and variance reduction relative to traditional OPE, and extends to unknown concepts via an end-to-end learning framework (PC-OPE) that optimizes concise, diverse concepts and concept-to-policy mappings. The approach enables targeted interventions on concepts for deeper insights into evaluation behavior and robustness, demonstrated on WindyGridworld and MIMIC-III with substantial variance reductions and improved interpretability. Limitations include potential bias and trajectory distribution mismatch when learning concepts, motivating future work on confounding, partial observability, and broader domains.

Abstract

Evaluating off-policy decisions using batch data poses significant challenges due to limited sample sizes leading to high variance. To improve Off-Policy Evaluation (OPE), we must identify and address the sources of this variance. Recent research on Concept Bottleneck Models (CBMs) shows that using human-explainable concepts can improve predictions and provide better understanding. We propose incorporating concepts into OPE to reduce variance. Our work introduces a family of concept-based OPE estimators, proving that they remain unbiased and reduce variance when concepts are known and predefined. Since real-world applications often lack predefined concepts, we further develop an end-to-end algorithm to learn interpretable, concise, and diverse parameterized concepts optimized for variance reduction. Our experiments with synthetic and real-world datasets show that both known and learned concept-based estimators significantly improve OPE performance. Crucially, we show that, unlike other OPE methods, concept-based estimators are easily interpretable and allow for targeted interventions on specific concepts, further enhancing the quality of these estimators.

Paper Structure

This paper contains 95 sections, 22 theorems, 74 equations, 10 figures, 3 tables, 1 algorithm.

Key Result

Theorem 5.3

Under known-concepts, when assumption ass:coverage holds, both $\hat{V}_{\pi_e}^{CIS}$ and $\hat{V}_{\pi_e}^{CPDIS}$ are unbiased estimators of the true value function $V_{\pi_e}$. (Proof: See Appendix sec:KnownConceptsTheoreticalProofs for details.)

Figures (10)

  • Figure 1: Simple example of a state vs concept. In this scenario, the state is the viral load in a patient's blood, whereas the concept is defined as the viral load being above or below a certain threshold $x$. The concept divides patients into two groups, in which different treatments are administered, indicated by the frequency of syringes. We do evaluation based on these two conceptual groups.
  • Figure 2: WindyGridworld: Known Concept-based estimators have lower variance, MSE, higher ESS compared to traditional OPE estimators, with a higher Bias. MIMIC: Known Concept-based estimators improve upon the variance.
  • Figure 3: Inverse propensity score comparisons under concepts and states. We observe the frequency of the lower IPS scores are left skewed in case of concepts over states. This indicates the source of variance reduction in concepts lies in the lowered IPS scores.
  • Figure 4: For both domains, unknown concept-based estimators show lower variance. In WindyGridworld, they improve MSE and ESS but exhibit higher bias compared to traditional OPE estimators.
  • Figure 5: Interpretation of Optimized Concepts. WindyGridworld: The first two subplots compare true oracle concepts with optimized concepts derived from the proposed methodology. Baseline with State Abstractions: The third subplot shows OPE performance as the number of state clusters increases, peaking at $K=33$ clusters before a spike in MSE and subsequent gradual improvement. The fourth subplot highlights state clusters at $K=33$, the optimal abstraction for OPE, which differs from both oracle and optimized concepts, underscoring the meaningfulness of learned concepts. MIMIC: Domain knowledge suggests patients with low urine output exhibit greater variance in learned concepts compared to high-output patients, revealing potential intervention targets.
  • ...and 5 more figures

Theorems & Definitions (28)

  • Definition 4.1: Concept-Based Importance Sampling (CIS)
  • Definition 4.2: Concept-based Per-Decision Importance Sampling, CPDIS
  • Theorem 5.3: Bias
  • Theorem 5.4: Variance comparison with traditional OPE estimators
  • Theorem 5.5: Variance comparison with MIS estimator
  • Theorem 5.6: Confidence bounds for Concept-based estimators
  • Theorem 6.1: Bias
  • Theorem 6.2: Variance comparison with traditional OPE estimators
  • Theorem 6.3: Variance comparison with MIS estimator
  • Theorem 6.4: Confidence bounds for Concept-based estimators
  • ...and 18 more