Generalized Fitted Q-Iteration with Clustered Data
Liyuan Hu, Jitao Wang, Zhenke Wu, Chengchun Shi
TL;DR
Addresses reinforcement learning with cluster-structured data and proposes GFQI that integrates generalized estimating equations to handle intra-cluster correlations. The method yields theoretical guarantees of estimator optimality under correct correlation specification and consistency under mis-specification, and demonstrates substantial empirical gains (approximately 50% average regret reduction, up to 80% under strong correlations) on simulations and mobile-health analyses. By accounting for cluster structure, GFQI improves sample efficiency for policy learning in healthcare and other domains with clustered data.
Abstract
This paper focuses on reinforcement learning (RL) with clustered data, which is commonly encountered in healthcare applications. We propose a generalized fitted Q-iteration (FQI) algorithm that incorporates generalized estimating equations into policy learning to handle the intra-cluster correlations. Theoretically, we demonstrate (i) the optimalities of our Q-function and policy estimators when the correlation structure is correctly specified, and (ii) their consistencies when the structure is mis-specified. Empirically, through simulations and analyses of a mobile health dataset, we find the proposed generalized FQI achieves, on average, a half reduction in regret compared to the standard FQI.
