Table of Contents
Fetching ...

Control in Stochastic Environment with Delays: A Model-based Reinforcement Learning Approach

Zhiyuan Yao, Ionut Florescu, Chihoon Lee

TL;DR

This paper develops Stochastic Model Based Simulation (SMBS) to control systems with delayed feedback and stochastic transitions by sampling multiple possible target states from a probabilistic environment model. The action policy combines mean Q-values with a risk penalty, ${\bar{Q}_M(a)} - {\alpha}{\hat{Q}_M(a)}$, enabling risk-aware planning in delay-prone settings. SMBS demonstrates robustness and often superior performance compared with AMDP and Delayed-Q across classic control tasks and Atari environments, and its risk parameter $\alpha$ provides tunable conservatism under uncertainty. Theoretical results establish equivalence to AMDP in deterministic cases and provide probabilistic error bounds as the number of samples grows, supporting practical applicability in real-world delayed control scenarios.

Abstract

In this paper we are introducing a new reinforcement learning method for control problems in environments with delayed feedback. Specifically, our method employs stochastic planning, versus previous methods that used deterministic planning. This allows us to embed risk preference in the policy optimization problem. We show that this formulation can recover the optimal policy for problems with deterministic transitions. We contrast our policy with two prior methods from literature. We apply the methodology to simple tasks to understand its features. Then, we compare the performance of the methods in controlling multiple Atari games.

Control in Stochastic Environment with Delays: A Model-based Reinforcement Learning Approach

TL;DR

This paper develops Stochastic Model Based Simulation (SMBS) to control systems with delayed feedback and stochastic transitions by sampling multiple possible target states from a probabilistic environment model. The action policy combines mean Q-values with a risk penalty, , enabling risk-aware planning in delay-prone settings. SMBS demonstrates robustness and often superior performance compared with AMDP and Delayed-Q across classic control tasks and Atari environments, and its risk parameter provides tunable conservatism under uncertainty. Theoretical results establish equivalence to AMDP in deterministic cases and provide probabilistic error bounds as the number of samples grows, supporting practical applicability in real-world delayed control scenarios.

Abstract

In this paper we are introducing a new reinforcement learning method for control problems in environments with delayed feedback. Specifically, our method employs stochastic planning, versus previous methods that used deterministic planning. This allows us to embed risk preference in the policy optimization problem. We show that this formulation can recover the optimal policy for problems with deterministic transitions. We contrast our policy with two prior methods from literature. We apply the methodology to simple tasks to understand its features. Then, we compare the performance of the methods in controlling multiple Atari games.
Paper Structure (13 sections, 2 theorems, 21 equations, 9 figures, 2 algorithms)

This paper contains 13 sections, 2 theorems, 21 equations, 9 figures, 2 algorithms.

Key Result

Theorem 1

Assume a discrete-time MDP with an infinite time horizon. The Markovian movement is deterministic, i.e., for arbitrary $(s, a)\in \mathcal{S}\times \mathcal{A}$, $t\geq 0$, there exists an $s'\in\mathcal{S}$ such that $P(S_{t+1} = s'\mid S_t = s, A_t = a) = 1$ for all $t=0,1, \ldots$ Then, the polic where $\Tilde{q}^*$ denotes the optimal Q-function for the AMDP.

Figures (9)

  • Figure 1: Illustration of control in real-time applications.
  • Figure 2: The stochastic environment evolution for 5 delay steps.
  • Figure 3: An illustration of the policy function of the SMBS method.
  • Figure 4: Illustrations of tasks used for comparison.
  • Figure 5: Illustrations of tasks used for comparison.
  • ...and 4 more figures

Theorems & Definitions (4)

  • Theorem 1
  • Theorem 2
  • proof
  • proof