Table of Contents
Fetching ...

The Minimal Search Space for Conditional Causal Bandits

Francisco N. F. Q. Simoes, Itai Feigenbaum, Mehdi Dastani, Thijs van Ommen

TL;DR

This work tackles the minimal search-space problem in conditional causal bandits where arms are single-node conditional interventions and the goal is to maximize a target variable $Y$. It provides a simple graphical characterization showing the minimal set of nodes that must be tested is the LSCA-closure of the parents of $Y$, denoted $\mathcal{L}^{\infty}(\mathrm{Pa}(Y))$, and introduces the C4 algorithm to compute this set in $O(|V|+|E|)$. The authors prove an equivalence between conditional and deterministic atomic superiority, establish theoretical correctness, and demonstrate substantial practical benefits via experiments on random and real-world graphs, achieving significant search-space pruning and faster convergence when paired with CondIntUCB. These results enable scalable, faster learning in causal decision-making tasks by focusing exploration on a provably minimal node subset. The study thus provides a principled, efficient approach to accelerate causal bandit algorithms in settings with conditional interventions.

Abstract

Causal knowledge can be used to support decision-making problems. This has been recognized in the causal bandits literature, where a causal (multi-armed) bandit is characterized by a causal graphical model and a target variable. The arms are then interventions on the causal model, and rewards are samples of the target variable. Causal bandits were originally studied with a focus on hard interventions. We focus instead on cases where the arms are conditional interventions, which more accurately model many real-world decision-making problems by allowing the value of the intervened variable to be chosen based on the observed values of other variables. This paper presents a graphical characterization of the minimal set of nodes guaranteed to contain the optimal conditional intervention, which maximizes the expected reward. We then propose an efficient algorithm with a time complexity of $O(|V| + |E|)$ to identify this minimal set of nodes. We prove that the graphical characterization and the proposed algorithm are correct. Finally, we empirically demonstrate that our algorithm significantly prunes the search space and substantially accelerates convergence rates when integrated into standard multi-armed bandit algorithms.

The Minimal Search Space for Conditional Causal Bandits

TL;DR

This work tackles the minimal search-space problem in conditional causal bandits where arms are single-node conditional interventions and the goal is to maximize a target variable . It provides a simple graphical characterization showing the minimal set of nodes that must be tested is the LSCA-closure of the parents of , denoted , and introduces the C4 algorithm to compute this set in . The authors prove an equivalence between conditional and deterministic atomic superiority, establish theoretical correctness, and demonstrate substantial practical benefits via experiments on random and real-world graphs, achieving significant search-space pruning and faster convergence when paired with CondIntUCB. These results enable scalable, faster learning in causal decision-making tasks by focusing exploration on a provably minimal node subset. The study thus provides a principled, efficient approach to accelerate causal bandit algorithms in settings with conditional interventions.

Abstract

Causal knowledge can be used to support decision-making problems. This has been recognized in the causal bandits literature, where a causal (multi-armed) bandit is characterized by a causal graphical model and a target variable. The arms are then interventions on the causal model, and rewards are samples of the target variable. Causal bandits were originally studied with a focus on hard interventions. We focus instead on cases where the arms are conditional interventions, which more accurately model many real-world decision-making problems by allowing the value of the intervened variable to be chosen based on the observed values of other variables. This paper presents a graphical characterization of the minimal set of nodes guaranteed to contain the optimal conditional intervention, which maximizes the expected reward. We then propose an efficient algorithm with a time complexity of to identify this minimal set of nodes. We prove that the graphical characterization and the proposed algorithm are correct. Finally, we empirically demonstrate that our algorithm significantly prunes the search space and substantially accelerates convergence rates when integrated into standard multi-armed bandit algorithms.

Paper Structure

This paper contains 27 sections, 28 theorems, 31 equations, 6 figures, 1 algorithm.

Key Result

Proposition 3

Let $X$, $W$, $Y$ be nodes in a DAG $G$. Then $X$ is average conditional-interventionally superior to $W$ relative to $Y$ in $G$ if and only if $X$ is atomic-interventionally superior to $W$ relative to $Y$ in $G$. That is:

Figures (6)

  • Figure 1: Examples illustrating heuristics behind the graphical characterization of the minimal interventionally superior set. The gray nodes are those that should be tested by conditional causal bandit algorithms.
  • Figure 2: A $\Lambda$-structure over $(\mathbf{U}, \mathbf{U})$. The LSCA closure $\mathcal{L}^{\infty}(\mathbf{U})$ of a set $\mathbf{U}$ is the set of all such structures.
  • Figure 3: Illustration of the connectors in a graph. The square nodes belong to $\mathbf{U}$, the connector of each node is written in red next to its node, and the LSCA closure $\mathcal{L}^{\infty}(\mathbf{U})$ consists of the gray nodes.
  • Figure 4: Comparison of cumulative regret curves for node selection using a UCB-based bandit algorithm for conditional interventions, with (mGISS) and without (brute-force) pruning the search space. These curves were obtained by averaging over $500$ runs, on four bnlearn datasets (asia, sachs, child). For every dataset, pruning the search space with the C4 algorithm results in faster convergence and smaller values of regret.
  • Figure 5: Fraction of nodes remaining after applying our search space filtering procedure, on random graphs. $1000$ graphs were generated for each pair $($number of nodes, expected degree$)$. The impact of our method decreases with the expected degree, and increases with the number of nodes.
  • ...and 1 more figures

Theorems & Definitions (69)

  • Definition 1: Conditional-Intervention Superiority
  • Definition 2: Deterministic Atomic Intervention Superiority
  • Definition 3
  • Proposition 3: Conditional vs Atomic superiority
  • Remark 4
  • Remark 5
  • Definition 6: GISS and mGISS
  • Proposition 6: Uniqueness of the mGISS
  • Definition 7: Lowest Strict Common Ancestors of a Pair of Nodes
  • Definition 8: Lowest Strict Common Ancestors of a Set
  • ...and 59 more