The Minimal Search Space for Conditional Causal Bandits
Francisco N. F. Q. Simoes, Itai Feigenbaum, Mehdi Dastani, Thijs van Ommen
TL;DR
This work tackles the minimal search-space problem in conditional causal bandits where arms are single-node conditional interventions and the goal is to maximize a target variable $Y$. It provides a simple graphical characterization showing the minimal set of nodes that must be tested is the LSCA-closure of the parents of $Y$, denoted $\mathcal{L}^{\infty}(\mathrm{Pa}(Y))$, and introduces the C4 algorithm to compute this set in $O(|V|+|E|)$. The authors prove an equivalence between conditional and deterministic atomic superiority, establish theoretical correctness, and demonstrate substantial practical benefits via experiments on random and real-world graphs, achieving significant search-space pruning and faster convergence when paired with CondIntUCB. These results enable scalable, faster learning in causal decision-making tasks by focusing exploration on a provably minimal node subset. The study thus provides a principled, efficient approach to accelerate causal bandit algorithms in settings with conditional interventions.
Abstract
Causal knowledge can be used to support decision-making problems. This has been recognized in the causal bandits literature, where a causal (multi-armed) bandit is characterized by a causal graphical model and a target variable. The arms are then interventions on the causal model, and rewards are samples of the target variable. Causal bandits were originally studied with a focus on hard interventions. We focus instead on cases where the arms are conditional interventions, which more accurately model many real-world decision-making problems by allowing the value of the intervened variable to be chosen based on the observed values of other variables. This paper presents a graphical characterization of the minimal set of nodes guaranteed to contain the optimal conditional intervention, which maximizes the expected reward. We then propose an efficient algorithm with a time complexity of $O(|V| + |E|)$ to identify this minimal set of nodes. We prove that the graphical characterization and the proposed algorithm are correct. Finally, we empirically demonstrate that our algorithm significantly prunes the search space and substantially accelerates convergence rates when integrated into standard multi-armed bandit algorithms.
