Control in Stochastic Environment with Delays: A Model-based Reinforcement Learning Approach
Zhiyuan Yao, Ionut Florescu, Chihoon Lee
TL;DR
This paper develops Stochastic Model Based Simulation (SMBS) to control systems with delayed feedback and stochastic transitions by sampling multiple possible target states from a probabilistic environment model. The action policy combines mean Q-values with a risk penalty, ${\bar{Q}_M(a)} - {\alpha}{\hat{Q}_M(a)}$, enabling risk-aware planning in delay-prone settings. SMBS demonstrates robustness and often superior performance compared with AMDP and Delayed-Q across classic control tasks and Atari environments, and its risk parameter $\alpha$ provides tunable conservatism under uncertainty. Theoretical results establish equivalence to AMDP in deterministic cases and provide probabilistic error bounds as the number of samples grows, supporting practical applicability in real-world delayed control scenarios.
Abstract
In this paper we are introducing a new reinforcement learning method for control problems in environments with delayed feedback. Specifically, our method employs stochastic planning, versus previous methods that used deterministic planning. This allows us to embed risk preference in the policy optimization problem. We show that this formulation can recover the optimal policy for problems with deterministic transitions. We contrast our policy with two prior methods from literature. We apply the methodology to simple tasks to understand its features. Then, we compare the performance of the methods in controlling multiple Atari games.
