Reinforcement Learning for Respondent-Driven Sampling

Justin Weltz; Angela Yoon; Yichi Zhang; Alexander Volfovsky; Eric Laber

Reinforcement Learning for Respondent-Driven Sampling

Justin Weltz, Angela Yoon, Yichi Zhang, Alexander Volfovsky, Eric Laber

TL;DR

This work develops RL-RDS, a principled framework for adaptively allocating RDS incentives using a branching-process working model and Thompson sampling with clipping to maximize cumulative study utility under budget constraints. It rigorously derives asymptotic regret bounds and establishes consistency and convergence rates for online parameter estimation under complete-generation sampling, while enabling valid post-adaptive inference via projection confidence sets even when the model is not identifiable. The authors demonstrate substantial efficiency gains over static and two-stage designs in simulations and show that their projection-based inference achieves nominal coverage under varying graph densities, with robust finite-sample performance. The approach advances adaptive sampling in hidden populations by combining online learning, model-based exploration, and valid inference, and it outlines extensions to generalized RDS inference and ridge-regularized estimation to accommodate misspecification and high-dimensional settings.

Abstract

Respondent-driven sampling (RDS) is widely used to study hidden or hard-to-reach populations by incentivizing study participants to recruit their social connections. The success and efficiency of RDS can depend critically on the nature of the incentives, including their number, value, call to action, etc. Standard RDS uses an incentive structure that is set a priori and held fixed throughout the study. Thus, it does not make use of accumulating information on which incentives are effective and for whom. We propose a reinforcement learning (RL) based adaptive RDS study design in which the incentives are tailored over time to maximize cumulative utility during the study. We show that these designs are more efficient, cost-effective, and can generate new insights into the social structure of hidden populations. In addition, we develop methods for valid post-study inference which are non-trivial due to the adaptive sampling induced by RL as well as the complex dependencies among subjects due to latent (unobserved) social network structure. We provide asymptotic regret bounds and illustrate its finite sample behavior through a suite of simulation experiments.

Reinforcement Learning for Respondent-Driven Sampling

TL;DR

Abstract

Paper Structure (53 sections, 28 theorems, 382 equations, 5 figures, 3 tables, 3 algorithms)

This paper contains 53 sections, 28 theorems, 382 equations, 5 figures, 3 tables, 3 algorithms.

Setup and Notation
Reinforcement Learning for RDS
Asymptotic Regret Bounds
A Branching Process Example
Inference for RL-RDS
RL-RDS Simulations
Policies
Results
Discussion
Glossary
Inference Algorithms
Proof of Theorem \ref{['thm:asympNorm']}
Supporting Martingale Limit Theory
Consistency
Convergence Rate
...and 38 more sections

Key Result

Theorem 2.1

For $\delta > 0$, define the event $E_\mathscr{I}= \left \{ \mathscr{I}> \delta \right \}$, where $\mathscr{I}$ is defined in Assumption as:generation_asymptotics. Under Assumptions as:1-as:equicontinuity, as $J \to \infty$, on event $E_\mathscr{I}$.

Figures (5)

Figure 1: RDS is a complex stochastic process that samples without replacement over a social network. The observed RDS sample is composed of coupon exchanges, illustrated by arrows ($\rightarrow$). The unobserved connections between sample participants are represented as dashed lines (). The observed data resembles a branching process.
Figure 2: This figure compares the estimated cumulative reward of each policy with 90% Monte Carlo confidence intervals over multiple sample sizes and graph densities.
Figure 3: This figure compares the estimated cumulative reward of each policy with 90% Monte Carlo confidence intervals over multiple sample sizes and graph densities.
Figure 4: This figure compares the estimated cumulative reward of each policy with 90% Monte Carlo confidence intervals in simulation setting 1.
Figure 5: This figure compares the estimated cumulative reward of each policy with 90% Monte Carlo confidence intervals in simulation setting 2.

Theorems & Definitions (50)

Theorem 2.1
Theorem 2.2
Theorem 2.3
Theorem 3.1
Theorem 8.1: Theorem 2.19 from hall2014martingale
Theorem 8.2: Theorem 2.17 from hall2014martingale
Theorem 8.3: Theorem 2.18 from hall2014martingale
Lemma 8.1
Theorem 8.4
proof
...and 40 more

Reinforcement Learning for Respondent-Driven Sampling

TL;DR

Abstract

Reinforcement Learning for Respondent-Driven Sampling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (50)