Table of Contents
Fetching ...

Networked Restless Multi-Arm Bandits with Reinforcement Learning

Hanmo Zhang, Zenghui Sun, Kai Wang

TL;DR

The paper tackles the limitation of traditional RMABs by incorporating network interactions through the Independent Cascade model, creating the Networked RMAB framework. It derives a Bellman equation for NRMB, proves the Q-function is submodular, and shows that a hill-climbing action selection yields a (1-1/e) approximation with a γ-contraction via a multi-bellman operator. A scalable Q-learning approach, including DQN and Graph Neural Network variants with hill-climbing, is developed and validated on real-world network data, outperforming network-blind baselines and demonstrating the value of network-aware interventions. The work provides theoretical guarantees, scalable algorithms, and empirical evidence that network effects materially improve resource allocation decisions in public health and related domains.

Abstract

Restless Multi-Armed Bandits (RMABs) are a powerful framework for sequential decision-making, widely applied in resource allocation and intervention optimization challenges in public health. However, traditional RMABs assume independence among arms, limiting their ability to account for interactions between individuals that can be common and significant in a real-world environment. This paper introduces Networked RMAB, a novel framework that integrates the RMAB model with the independent cascade model to capture interactions between arms in networked environments. We define the Bellman equation for networked RMAB and present its computational challenge due to exponentially large action and state spaces. To resolve the computational challenge, we establish the submodularity of Bellman equation and apply the hill-climbing algorithm to achieve a $1-\frac{1}{e}$ approximation guarantee in Bellman updates. Lastly, we prove that the approximate Bellman updates are guaranteed to converge by a modified contraction analysis. We experimentally verify these results by developing an efficient Q-learning algorithm tailored to the networked setting. Experimental results on real-world graph data demonstrate that our Q-learning approach outperforms both $k$-step look-ahead and network-blind approaches, highlighting the importance of capturing and leveraging network effects where they exist.

Networked Restless Multi-Arm Bandits with Reinforcement Learning

TL;DR

The paper tackles the limitation of traditional RMABs by incorporating network interactions through the Independent Cascade model, creating the Networked RMAB framework. It derives a Bellman equation for NRMB, proves the Q-function is submodular, and shows that a hill-climbing action selection yields a (1-1/e) approximation with a γ-contraction via a multi-bellman operator. A scalable Q-learning approach, including DQN and Graph Neural Network variants with hill-climbing, is developed and validated on real-world network data, outperforming network-blind baselines and demonstrating the value of network-aware interventions. The work provides theoretical guarantees, scalable algorithms, and empirical evidence that network effects materially improve resource allocation decisions in public health and related domains.

Abstract

Restless Multi-Armed Bandits (RMABs) are a powerful framework for sequential decision-making, widely applied in resource allocation and intervention optimization challenges in public health. However, traditional RMABs assume independence among arms, limiting their ability to account for interactions between individuals that can be common and significant in a real-world environment. This paper introduces Networked RMAB, a novel framework that integrates the RMAB model with the independent cascade model to capture interactions between arms in networked environments. We define the Bellman equation for networked RMAB and present its computational challenge due to exponentially large action and state spaces. To resolve the computational challenge, we establish the submodularity of Bellman equation and apply the hill-climbing algorithm to achieve a approximation guarantee in Bellman updates. Lastly, we prove that the approximate Bellman updates are guaranteed to converge by a modified contraction analysis. We experimentally verify these results by developing an efficient Q-learning algorithm tailored to the networked setting. Experimental results on real-world graph data demonstrate that our Q-learning approach outperforms both -step look-ahead and network-blind approaches, highlighting the importance of capturing and leveraging network effects where they exist.

Paper Structure

This paper contains 36 sections, 6 theorems, 34 equations, 4 figures, 1 algorithm.

Key Result

Theorem 1

Given a submodular value function $V(\boldsymbol{s})$ and a constant state $\boldsymbol{s}$, we show that $Q(\boldsymbol{s},\boldsymbol{a})$ is submodular with respect to $a$.

Figures (4)

  • Figure 1: Visual representation of a single Networked RMAB timestep: initial state, selection of $k$ active actions, independent transitions, cascade propagation, and resulting next state.
  • Figure 2: Mean $\pm$ SD fraction of activated nodes over 30 timesteps on the India contact network ($n=202,\ |E|=692;\ k=20;\ 10\ \text{seeds}\times 50\ \text{runs}$) shows the GNN stabilizing near $82\%$ activation and consistently outperforming DQN, Whittle index, 1‑step look‑ahead, and the no‑intervention baseline.
  • Figure 3: Mean $\pm$ SD activation fraction over 30 timesteps on a 10‑node graph. DQN and GNN match tabular Q‑learning’s near‑optimal performance in networked RMABs.
  • Figure 4: Total runtime (per epoch runtime for DQN and GNN) versus graph size $n$. Results reveal tabular Q‑learning’s exponential run-time growth, while DQN and GNN grow linearly.

Theorems & Definitions (14)

  • Theorem 1: Submodularity
  • proof : Proof sketch
  • Definition 1: Bellman Operator for Bellman Equation with Hill-Climbing Action Selection
  • Theorem 2: Contraction
  • proof : Proof Sketch
  • Definition 2: Multi-Bellman Operator for the Hill-Climbing Variant
  • Theorem 3: Hill-Climbing Equivalence
  • proof : Proof Sketch
  • Theorem 3: Submodularity
  • proof : Proof
  • ...and 4 more