Table of Contents
Fetching ...

On Transportability for Structural Causal Bandits

Min Woo Park, Sanghack Lee

TL;DR

This paper investigates the structural causal bandit with transportability, where priors from the source environments are fused to enhance learning in the deployment setting and achieves a sub-linear regret bound with an explicit dependence on informativeness of prior data.

Abstract

Intelligent agents equipped with causal knowledge can optimize their action spaces to avoid unnecessary exploration. The structural causal bandit framework provides a graphical characterization for identifying actions that are unable to maximize rewards by leveraging prior knowledge of the underlying causal structure. While such knowledge enables an agent to estimate the expected rewards of certain actions based on others in online interactions, there has been little guidance on how to transfer information inferred from arbitrary combinations of datasets collected under different conditions -- observational or experimental -- and from heterogeneous environments. In this paper, we investigate the structural causal bandit with transportability, where priors from the source environments are fused to enhance learning in the deployment setting. We demonstrate that it is possible to exploit invariances across environments to consistently improve learning. The resulting bandit algorithm achieves a sub-linear regret bound with an explicit dependence on informativeness of prior data, and it may outperform standard bandit approaches that rely solely on online learning.

On Transportability for Structural Causal Bandits

TL;DR

This paper investigates the structural causal bandit with transportability, where priors from the source environments are fused to enhance learning in the deployment setting and achieves a sub-linear regret bound with an explicit dependence on informativeness of prior data.

Abstract

Intelligent agents equipped with causal knowledge can optimize their action spaces to avoid unnecessary exploration. The structural causal bandit framework provides a graphical characterization for identifying actions that are unable to maximize rewards by leveraging prior knowledge of the underlying causal structure. While such knowledge enables an agent to estimate the expected rewards of certain actions based on others in online interactions, there has been little guidance on how to transfer information inferred from arbitrary combinations of datasets collected under different conditions -- observational or experimental -- and from heterogeneous environments. In this paper, we investigate the structural causal bandit with transportability, where priors from the source environments are fused to enhance learning in the deployment setting. We demonstrate that it is possible to exploit invariances across environments to consistently improve learning. The resulting bandit algorithm achieves a sub-linear regret bound with an explicit dependence on informativeness of prior data, and it may outperform standard bandit approaches that rely solely on online learning.

Paper Structure

This paper contains 57 sections, 24 theorems, 31 equations, 9 figures, 1 table, 5 algorithms.

Key Result

Theorem 1

Let $\mathbf{r}^\star$ be an optimal action with respect to $\langle \mathcal{G}, Y, \mathbf{N}\rangle$ where $\mathbf{N}$ is a subset of $\mathbf{N}^\ast$. Let $\mathbf{W}$ be a non-POIS with respect to $\langle \mathcal{G}, Y, \mathbf{N}^\ast \rangle$. Then $\mathbb{E}_{P^\ast_{\mathbf{x}^\star}}

Figures (9)

  • Figure 1: Diagram encoding causal relations.
  • Figure 2: Collective selection diagrams for (a) the introductory example and (b) $\Delta^1 = \{A\}$ ( red) and $\Delta^2 = \{C\}$ ( blue).
  • Figure 3: Hierarchical relationships between POMISs under different constraints. Arrows indicate the direction of dominance relations.
  • Figure 4:
  • Figure 5:
  • ...and 4 more figures

Theorems & Definitions (49)

  • Definition 1: Environment discrepancy
  • Definition 2: Selection diagram
  • Definition 3: Structural causal bandits with transportability
  • Definition 4: Minimality lee2018structural
  • Definition 5: Possibly-optimal intervention set
  • Theorem 1: Dominance relationship
  • Definition 6: Transportability lee2020general
  • Proposition 1
  • Theorem 2: Causal bounds
  • Corollary 1
  • ...and 39 more