Statistical Inference in Reinforcement Learning: A Selective Survey
Chengchun Shi
TL;DR
This work surveys the role of statistical inference in reinforcement learning, focusing on hypothesis testing for the Markov property and off-policy confidence interval estimation. It introduces forward-backward generative learning to construct doubly robust tests for conditional independence and Markov assumptions, enabling robust model selection (MDP vs higher-order or POMDP) in offline RL. Across diabetes and Tiger problem case studies, the framework demonstrates practical identification of MDP order and improved policy evaluation under correct model assumptions. The contributions bridge classical statistical tools with RL practice, offering scalable methods for uncertainty quantification and model validation in sequential decision problems.
Abstract
Reinforcement learning (RL) is concerned with how intelligence agents take actions in a given environment to maximize the cumulative reward they receive. In healthcare, applying RL algorithms could assist patients in improving their health status. In ride-sharing platforms, applying RL algorithms could increase drivers' income and customer satisfaction. For large language models, applying RL algorithms could align their outputs with human preferences. Over the past decade, RL has been arguably one of the most vibrant research frontiers in machine learning. Nevertheless, statistics as a field, as opposed to computer science, has only recently begun to engage with RL both in depth and in breadth. This chapter presents a selective review of statistical inferential tools for RL, covering both hypothesis testing and confidence interval construction. Our goal is to highlight the value of statistical inference in RL for both the statistics and machine learning communities, and to promote the broader application of classical statistical inference tools in this vibrant area of research.
