Table of Contents
Fetching ...

Graph-Dependent Regret Bounds in Multi-Armed Bandits with Interference

Fateme Jamshidi, Mohammad Shahverdikondori, Negar Kiyavash

Abstract

We study multi-armed bandits under network interference, where each unit's reward depends on its own treatment and those of its neighbors in a given graph. This induces an exponentially large action space, making standard approaches computationally impractical. We propose a novel algorithm that uses the local graph structure to minimize regret. We derive a graph-dependent upper bound on cumulative regret that improves over prior work. Additionally, we provide the first lower bounds for bandits with arbitrary network interference, where each bound involves a distinct structural property of the graph. These bounds show that for both dense and sparse graphs, our algorithm is nearly optimal, with matching upper and lower bounds up to logarithmic factors. When the interference graph is unknown, a variant of our algorithm is Pareto optimal: no algorithm can uniformly outperform it across all instances. We complement our theoretical results with numerical experiments, showing that our approach outperforms the baseline methods.

Graph-Dependent Regret Bounds in Multi-Armed Bandits with Interference

Abstract

We study multi-armed bandits under network interference, where each unit's reward depends on its own treatment and those of its neighbors in a given graph. This induces an exponentially large action space, making standard approaches computationally impractical. We propose a novel algorithm that uses the local graph structure to minimize regret. We derive a graph-dependent upper bound on cumulative regret that improves over prior work. Additionally, we provide the first lower bounds for bandits with arbitrary network interference, where each bound involves a distinct structural property of the graph. These bounds show that for both dense and sparse graphs, our algorithm is nearly optimal, with matching upper and lower bounds up to logarithmic factors. When the interference graph is unknown, a variant of our algorithm is Pareto optimal: no algorithm can uniformly outperform it across all instances. We complement our theoretical results with numerical experiments, showing that our approach outperforms the baseline methods.

Paper Structure

This paper contains 17 sections, 18 theorems, 85 equations, 4 figures, 1 algorithm.

Key Result

Theorem 3.3

[Graph-Partitioned Regret Upper Bound] The expected cumulative regret of Algorithm alg:ucb_bandit with $\delta = (T^2N \sum_{j \in [M]} k^{D_j+1})^{-1}$ interacting with any instance with $1$-sub-Gaussian rewards and interference graph $\mathcal{G}$, partitioned into $P_1, P_2, \ldots, P_M$, satisfi

Figures (4)

  • Figure 1: Graph partitions from Eq. \ref{['eq: relation']}; same-color nodes share a partition.
  • Figure 2: A $(3,2)$-clique-sparse graph with colored clusters.
  • Figure 3: Average regret vs. number of units ($N$).
  • Figure 4: Comparison of average regret for various algorithms.

Theorems & Definitions (34)

  • Definition 2.1: Regret
  • Remark 2.2
  • Definition 3.1: Doubly-Independent Set
  • Definition 3.2: Square Chromatic Number
  • Theorem 3.3
  • Corollary 3.4
  • Remark 3.5
  • Theorem 4.1
  • Theorem 4.2
  • Corollary 4.3
  • ...and 24 more