Table of Contents
Fetching ...

Evaluating and Learning Robust Bandit Policies Under Uncertain Causal Mechanisms

Katherine Avery, Chinmay Pendse, David Jensen

Abstract

Causal graphical models can encode large amounts structural knowledge, both from the background knowledge of domain experts and the structural knowledge discovered from randomized experiments or observational data. However, though we may know the general structure of causal relationships, we often do not know the exact causal mechanisms. In this work, we propose a causal multi-armed bandit evaluation and learning algorithm that can reason effectively despite uncertainty over conditional probability distributions. Further, we show how conditional independence testing can be used to choose variables for modeling. We find that the structural equation model (SEM) approach gives more accurate evaluations compared to traditional approaches, particularly as the range of possible causal mechanisms grows. Further, the SEM approach learns low-variance policies, and it learns an optimal policy, assuming the model is sufficiently well-specified. Traditional approaches can converge to local extrema or fail to converge at all.

Evaluating and Learning Robust Bandit Policies Under Uncertain Causal Mechanisms

Abstract

Causal graphical models can encode large amounts structural knowledge, both from the background knowledge of domain experts and the structural knowledge discovered from randomized experiments or observational data. However, though we may know the general structure of causal relationships, we often do not know the exact causal mechanisms. In this work, we propose a causal multi-armed bandit evaluation and learning algorithm that can reason effectively despite uncertainty over conditional probability distributions. Further, we show how conditional independence testing can be used to choose variables for modeling. We find that the structural equation model (SEM) approach gives more accurate evaluations compared to traditional approaches, particularly as the range of possible causal mechanisms grows. Further, the SEM approach learns low-variance policies, and it learns an optimal policy, assuming the model is sufficiently well-specified. Traditional approaches can converge to local extrema or fail to converge at all.

Paper Structure

This paper contains 42 sections, 19 equations, 12 figures, 3 algorithms.

Figures (12)

  • Figure 1: Synthetic results. The evaluation plots show (top left) bias vs. amount of uncertainty; (top middle) bias vs. number of samples; and (top right) bias vs. number of uncertain variables. The learning plots show (bottom left) regret vs. amount of uncertainty; (bottom middle) regret vs. number of samples; and (bottom right) regret vs. number of uncertain variables for the synthetic data. FDRO did not converge when learning.
  • Figure 2: Voting evaluation and learning results. The plots show (left) bias vs. number of samples for the voting dataset, and (right) regret vs. number of samples for the voting dataset.
  • Figure 3: Sensitivity analysis. From left to right, the plots show bias vs. percent of misspecified parents, bias vs. number of uncertain variables misidentified, bias vs. number of error terms misspecified, and bias vs. number of binary terms approximated. The top row shows results for the synthetic data, and the bottom row shows results for the voting data. Note that the y-axes are rescaled.
  • Figure 4: Example mechanism for $X_5$. The mechanism for $X_5$ is a different straight line under different actions. The best action for a given $X_0+X_1+X_2$ is the action with worst-case that maximizes $X_5$.
  • Figure 5: Synthetic graph: This causal graph corresponds to the relationships in the synthetic data described in Appx. \ref{['appx:dataset-detail-synthetic']}. $A$ represents an intervention on $X_5$. Because this graph corresponds to the training data, $A$ is not connected to the covariates $X_0$, $X_1$, and $X_2$ because $\pi_0$ took random actions.
  • ...and 7 more figures