Reinforcement Learning for Respondent-Driven Sampling
Justin Weltz, Angela Yoon, Yichi Zhang, Alexander Volfovsky, Eric Laber
TL;DR
This work develops RL-RDS, a principled framework for adaptively allocating RDS incentives using a branching-process working model and Thompson sampling with clipping to maximize cumulative study utility under budget constraints. It rigorously derives asymptotic regret bounds and establishes consistency and convergence rates for online parameter estimation under complete-generation sampling, while enabling valid post-adaptive inference via projection confidence sets even when the model is not identifiable. The authors demonstrate substantial efficiency gains over static and two-stage designs in simulations and show that their projection-based inference achieves nominal coverage under varying graph densities, with robust finite-sample performance. The approach advances adaptive sampling in hidden populations by combining online learning, model-based exploration, and valid inference, and it outlines extensions to generalized RDS inference and ridge-regularized estimation to accommodate misspecification and high-dimensional settings.
Abstract
Respondent-driven sampling (RDS) is widely used to study hidden or hard-to-reach populations by incentivizing study participants to recruit their social connections. The success and efficiency of RDS can depend critically on the nature of the incentives, including their number, value, call to action, etc. Standard RDS uses an incentive structure that is set a priori and held fixed throughout the study. Thus, it does not make use of accumulating information on which incentives are effective and for whom. We propose a reinforcement learning (RL) based adaptive RDS study design in which the incentives are tailored over time to maximize cumulative utility during the study. We show that these designs are more efficient, cost-effective, and can generate new insights into the social structure of hidden populations. In addition, we develop methods for valid post-study inference which are non-trivial due to the adaptive sampling induced by RL as well as the complex dependencies among subjects due to latent (unobserved) social network structure. We provide asymptotic regret bounds and illustrate its finite sample behavior through a suite of simulation experiments.
