Table of Contents
Fetching ...

Bayesian Collaborative Bandits with Thompson Sampling for Improved Outreach in Maternal Health Program

Arpan Dasgupta, Gagan Jain, Arun Suggala, Karthikeyan Shanmugam, Milind Tambe, Aparna Taneja

TL;DR

This work proposes a principled Bayesian approach using Thompson Sampling for this collaborative bandit problem, and demonstrates significant improvements over state-of-the-art baselines on a real-world dataset from the world's largest maternal mHealth program.

Abstract

Mobile health (mHealth) programs face a critical challenge in optimizing the timing of automated health information calls to beneficiaries. This challenge has been formulated as a collaborative multi-armed bandit problem, requiring online learning of a low-rank reward matrix. Existing solutions often rely on heuristic combinations of offline matrix completion and exploration strategies. In this work, we propose a principled Bayesian approach using Thompson Sampling for this collaborative bandit problem. Our method leverages prior information through efficient Gibbs sampling for posterior inference over the low-rank matrix factors, enabling faster convergence. We demonstrate significant improvements over state-of-the-art baselines on a real-world dataset from the world's largest maternal mHealth program. Our approach achieves a $16\%$ reduction in the number of calls compared to existing methods and a $47$\% reduction compared to the deployed random policy. This efficiency gain translates to a potential increase in program capacity by $0.5-1.4$ million beneficiaries, granting them access to vital ante-natal and post-natal care information. Furthermore, we observe a $7\%$ and $29\%$ improvement in beneficiary retention (an extremely hard metric to impact) compared to state-of-the-art and deployed baselines, respectively. Synthetic simulations further demonstrate the superiority of our approach, particularly in low-data regimes and in effectively utilizing prior information. We also provide a theoretical analysis of our algorithm in a special setting using Eluder dimension.

Bayesian Collaborative Bandits with Thompson Sampling for Improved Outreach in Maternal Health Program

TL;DR

This work proposes a principled Bayesian approach using Thompson Sampling for this collaborative bandit problem, and demonstrates significant improvements over state-of-the-art baselines on a real-world dataset from the world's largest maternal mHealth program.

Abstract

Mobile health (mHealth) programs face a critical challenge in optimizing the timing of automated health information calls to beneficiaries. This challenge has been formulated as a collaborative multi-armed bandit problem, requiring online learning of a low-rank reward matrix. Existing solutions often rely on heuristic combinations of offline matrix completion and exploration strategies. In this work, we propose a principled Bayesian approach using Thompson Sampling for this collaborative bandit problem. Our method leverages prior information through efficient Gibbs sampling for posterior inference over the low-rank matrix factors, enabling faster convergence. We demonstrate significant improvements over state-of-the-art baselines on a real-world dataset from the world's largest maternal mHealth program. Our approach achieves a reduction in the number of calls compared to existing methods and a \% reduction compared to the deployed random policy. This efficiency gain translates to a potential increase in program capacity by million beneficiaries, granting them access to vital ante-natal and post-natal care information. Furthermore, we observe a and improvement in beneficiary retention (an extremely hard metric to impact) compared to state-of-the-art and deployed baselines, respectively. Synthetic simulations further demonstrate the superiority of our approach, particularly in low-data regimes and in effectively utilizing prior information. We also provide a theoretical analysis of our algorithm in a special setting using Eluder dimension.

Paper Structure

This paper contains 21 sections, 6 theorems, 13 equations, 7 figures, 4 algorithms.

Key Result

Theorem 1

The updates in parameters for one user are independent from the other users.

Figures (7)

  • Figure 1: (Left) Example pick-up matrix-each entry such as 0.18 (top left) represents likelihood of a call being answered by the first beneficiary in the first time slot. (Right) Matrix decomposition into user X user type and user type X pickup rate probability matrices.
  • Figure 2: Regret for the low-rank case on simulated data: Average regret for different methods averaged over $15$ random matrices. Results are on a $1000$ users and $20$ arms matrix with $4$ user types. Every time step adds $1000$ samples.
  • Figure 3: Number of attempts needed to reach out to beneficiaries in the real-world ARMMAN dataset across $3$ listenership buckets. All plots are relative to the deployed random baseline capped at $100\%$.
  • Figure 4: Percentage of dropoffs over a $4$ month period. Dropoffs happen when the engagement goes below $25\%$ for $6$ weeks consecutively or $9$ weeks in a $12$ week period.
  • Figure 5: Average regret for a single run for different ranks on the real world dataset. Increasing the rank reduces regret but only marginally beyond $C = 5$.
  • ...and 2 more figures

Theorems & Definitions (7)

  • Theorem 1
  • Definition 1: russo2013eluder
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5: Restated
  • Theorem 6: Restated