Causal Bandits with General Causal Models and Interventions

Zirui Yan; Dennis Wei; Dmitriy Katz-Rogozhnikov; Prasanna Sattigeri; Ali Tajer

Causal Bandits with General Causal Models and Interventions

Zirui Yan, Dennis Wei, Dmitriy Katz-Rogozhnikov, Prasanna Sattigeri, Ali Tajer

TL;DR

This paper studies causal bandits where interventions are applied to a DAG-structured system with unknown structural causal models drawn from a Lipschitz class. It introduces GCB-UCB and GCB-TS algorithms that leverage eluder-dimension and covering-number complexity to achieve regret bounds of the form $\mathcal{O}\left(K d^{L-1} \sqrt{T \operatorname{dim}(\mathcal{F}) \log({\rm cn}(\mathcal{F}))}\right)$, with only logarithmic dependence on the graph size through $N$ in the regret. The framework supports generalized soft interventions with continuum granularity and provides refined sublinear regret bounds for linear, polynomial, and neural network SCMs, including corresponding minimax lower bounds. The results extend the causal bandit literature beyond linear or Gaussian assumptions and demonstrate diminishing dependence on graph size as horizon $T$ grows. The work has broad impact for sequential experimental design in complex causal systems across domains such as biology, economics, and AI safety.

Abstract

This paper considers causal bandits (CBs) for the sequential design of interventions in a causal system. The objective is to optimize a reward function via minimizing a measure of cumulative regret with respect to the best sequence of interventions in hindsight. The paper advances the results on CBs in three directions. First, the structural causal models (SCMs) are assumed to be unknown and drawn arbitrarily from a general class $\mathcal{F}$ of Lipschitz-continuous functions. Existing results are often focused on (generalized) linear SCMs. Second, the interventions are assumed to be generalized soft with any desired level of granularity, resulting in an infinite number of possible interventions. The existing literature, in contrast, generally adopts atomic and hard interventions. Third, we provide general upper and lower bounds on regret. The upper bounds subsume (and improve) known bounds for special cases. The lower bounds are generally hitherto unknown. These bounds are characterized as functions of the (i) graph parameters, (ii) eluder dimension of the space of SCMs, denoted by $\operatorname{dim}(\mathcal{F})$, and (iii) the covering number of the function space, denoted by ${\rm cn}(\mathcal{F})$. Specifically, the cumulative achievable regret over horizon $T$ is $\mathcal{O}(K d^{L-1}\sqrt{T\operatorname{dim}(\mathcal{F}) \log({\rm cn}(\mathcal{F}))})$, where $K$ is related to the Lipschitz constants, $d$ is the graph's maximum in-degree, and $L$ is the length of the longest causal path. The upper bound is further refined for special classes of SCMs (neural network, polynomial, and linear), and their corresponding lower bounds are provided.

Causal Bandits with General Causal Models and Interventions

TL;DR

, with only logarithmic dependence on the graph size through

in the regret. The framework supports generalized soft interventions with continuum granularity and provides refined sublinear regret bounds for linear, polynomial, and neural network SCMs, including corresponding minimax lower bounds. The results extend the causal bandit literature beyond linear or Gaussian assumptions and demonstrate diminishing dependence on graph size as horizon

grows. The work has broad impact for sequential experimental design in complex causal systems across domains such as biology, economics, and AI safety.

Abstract

of Lipschitz-continuous functions. Existing results are often focused on (generalized) linear SCMs. Second, the interventions are assumed to be generalized soft with any desired level of granularity, resulting in an infinite number of possible interventions. The existing literature, in contrast, generally adopts atomic and hard interventions. Third, we provide general upper and lower bounds on regret. The upper bounds subsume (and improve) known bounds for special cases. The lower bounds are generally hitherto unknown. These bounds are characterized as functions of the (i) graph parameters, (ii) eluder dimension of the space of SCMs, denoted by

, and (iii) the covering number of the function space, denoted by

. Specifically, the cumulative achievable regret over horizon

, where

is related to the Lipschitz constants,

is the graph's maximum in-degree, and

is the length of the longest causal path. The upper bound is further refined for special classes of SCMs (neural network, polynomial, and linear), and their corresponding lower bounds are provided.

Paper Structure (33 sections, 28 theorems, 211 equations, 13 figures, 1 table, 2 algorithms)

This paper contains 33 sections, 28 theorems, 211 equations, 13 figures, 1 table, 2 algorithms.

INTRODUCTION
GCB MODEL AND OBJECTIVE
Causal Graph with General SCM
Generalised Soft Interventions
Problem Statement
PRELIMINARIES
Eluder Dimension and Covering Number
GCB ALGORITHMS
GCB-UCB Algorithm
GCB-TS Algorithm
REGRET GUARANTEE
Regret Upper Bounds
Refined Regret Bounds for Special SCMs
CONCLUSION
Regret Upper Bound -- General SCM
...and 18 more sections

Key Result

Lemma 5.1

For any $i\in[N]$ and $\forall \delta>0$ and $\forall \alpha>0$

Figures (13)

Figure 1: Construction of $\tilde{\mathcal{G}}$ based on $\mathcal{G}$ by embedding $s$ nodes between a node (node 4) and its parents (nodes 1,2,3).
Figure 2: Hierarchical graph with degree $d$ and maximum causal path length $L$.
Figure 3: Hierarchical graph with degree $d=3$ and causal length $L=2$.
Figure 4: Cumulative regret vs $T$ (polynomial)
Figure 5: Cumulative regret vs $T$ (neural network)
...and 8 more figures

Theorems & Definitions (42)

Definition 3.1: $\epsilon$-dependent
Definition 3.2: $\epsilon$-eluder dimension
Definition 3.3: Covering number
Lemma 5.1
Lemma 5.2: Compounding Error
Theorem 5.3: Regret Upper Bound
Corollary 5.4: Bayesian Regret Upper Bound
Theorem 5.5
Theorem 5.6: Minimax Lower Bound for Linear SCMs
Theorem 5.7
...and 32 more

Causal Bandits with General Causal Models and Interventions

TL;DR

Abstract

Causal Bandits with General Causal Models and Interventions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (42)