Table of Contents
Fetching ...

Causally Abstracted Multi-armed Bandits

Fabio Massimo Zennaro, Nicholas Bishop, Joel Dyer, Yorgos Felekis, Anisoara Calinescu, Michael Wooldridge, Theodoros Damoulas

TL;DR

This work introduces CAMAB, a framework that enables transfer learning across causal MABs defined on different variable sets by leveraging causal abstraction. It defines two quantitative measures, the interventional-consistency error $e(\boldsymbol{\alpha})$ and the reward discrepancy $s(\boldsymbol{\alpha})$, to bound the difference in expected rewards between base and abstract CMABs. Three representative transfer strategies are analyzed: transferring the optimal action (TOpt), transferring actions via imitation (IMIT), and transferring expected values (TExp); each comes with theoretical insights on when it preserves optimality and how regret scales. The authors provide extensive theoretical results and experiments, including an online advertising case, illustrating both the potential gains and pitfalls of CAMAB-based transfer. Overall, CAMAB broadens transfer learning in bandits to multi-resolution causal settings, offering practical guidance on when and how to transfer information across related CMABs.

Abstract

Multi-armed bandits (MAB) and causal MABs (CMAB) are established frameworks for decision-making problems. The majority of prior work typically studies and solves individual MAB and CMAB in isolation for a given problem and associated data. However, decision-makers are often faced with multiple related problems and multi-scale observations where joint formulations are needed in order to efficiently exploit the problem structures and data dependencies. Transfer learning for CMABs addresses the situation where models are defined on identical variables, although causal connections may differ. In this work, we extend transfer learning to setups involving CMABs defined on potentially different variables, with varying degrees of granularity, and related via an abstraction map. Formally, we introduce the problem of causally abstracted MABs (CAMABs) by relying on the theory of causal abstraction in order to express a rigorous abstraction map. We propose algorithms to learn in a CAMAB, and study their regret. We illustrate the limitations and the strengths of our algorithms on a real-world scenario related to online advertising.

Causally Abstracted Multi-armed Bandits

TL;DR

This work introduces CAMAB, a framework that enables transfer learning across causal MABs defined on different variable sets by leveraging causal abstraction. It defines two quantitative measures, the interventional-consistency error and the reward discrepancy , to bound the difference in expected rewards between base and abstract CMABs. Three representative transfer strategies are analyzed: transferring the optimal action (TOpt), transferring actions via imitation (IMIT), and transferring expected values (TExp); each comes with theoretical insights on when it preserves optimality and how regret scales. The authors provide extensive theoretical results and experiments, including an online advertising case, illustrating both the potential gains and pitfalls of CAMAB-based transfer. Overall, CAMAB broadens transfer learning in bandits to multi-resolution causal settings, offering practical guidance on when and how to transfer information across related CMABs.

Abstract

Multi-armed bandits (MAB) and causal MABs (CMAB) are established frameworks for decision-making problems. The majority of prior work typically studies and solves individual MAB and CMAB in isolation for a given problem and associated data. However, decision-makers are often faced with multiple related problems and multi-scale observations where joint formulations are needed in order to efficiently exploit the problem structures and data dependencies. Transfer learning for CMABs addresses the situation where models are defined on identical variables, although causal connections may differ. In this work, we extend transfer learning to setups involving CMABs defined on potentially different variables, with varying degrees of granularity, and related via an abstraction map. Formally, we introduce the problem of causally abstracted MABs (CAMABs) by relying on the theory of causal abstraction in order to express a rigorous abstraction map. We propose algorithms to learn in a CAMAB, and study their regret. We illustrate the limitations and the strengths of our algorithms on a real-world scenario related to online advertising.
Paper Structure (46 sections, 12 theorems, 67 equations, 13 figures, 2 tables, 2 algorithms)

This paper contains 46 sections, 12 theorems, 67 equations, 13 figures, 2 tables, 2 algorithms.

Key Result

Proposition 4.1

Given a CAMAB, the difference in expected rewards $|\mu_{a_i} - \mu'_{\boldsymbol{\alpha}(a_i)}|$ is bounded by $e(\boldsymbol{\alpha}) + s(\boldsymbol{\alpha})$.

Figures (13)

  • Figure 1: Base model $\mathcal{M}$ (left) and abstracted model $\mathcal{M'}$.(right)
  • Figure 2: Diagrams illustrating the (a) TOpt, (b) IMIT, (c) TExp algorithms
  • Figure 3: (a) Simple regret for TOpt from Ex. \ref{['ex:scenario1']} for an exact and maximum-preserving abstraction (blue lines) and an exact but not maximum-preserving abstraction (green lines). (b) Regret difference for IMIT from Ex. \ref{['ex:scenario2']} using abstractions aggregating values differently (red and blue lines). (c) Cumulative regret for TExp from Ex. \ref{['ex:scenario3']} for an abstraction preserving domains (blue lines) and an abstraction changing domains (green lines). See respective examples for further explanation. (d) Cumulative regret on the online advertising scenario.
  • Figure 4: Models for the first counterexample.
  • Figure 5: Model for the second counterexample.
  • ...and 8 more figures

Theorems & Definitions (25)

  • Definition 2.1: SCM pearl2009causality
  • Definition 2.2: Intervention pearl2009causality
  • Definition 2.3: Abstraction rischel2020category
  • Definition 2.4: Interventional consistency error rischel2020categoryzennaro2023quantifying
  • Definition 3.1: CAMAB
  • Example 3.2
  • Proposition 4.1: Bound on difference of expected rewards
  • Proposition 5.1: Biasedness of TOpt
  • Example 5.2
  • Example 5.3
  • ...and 15 more