Table of Contents
Fetching ...

Reinforcement Learning in Non-Markovian Environments

Siddharth Chandak, Pratik Shah, Vivek S Borkar, Parth Dodhia

TL;DR

The paper investigates reinforcement learning in non-Markovian environments by explicitly analyzing the error introduced by non-Markovian observations in Q-learning and proposing a criterion to approximate certain conditional laws via recursively computable approximate sufficient statistics (RCASS). It develops an autoencoder-based computation scheme, yielding a Non-Markovian Q-Agent (NMQ) that integrates RCASS with a Deep Q-Network to handle partial observability; the approach is validated on partially observed tasks where standard DQN struggles. The work connects reinforcement learning with classical stochastic control concepts (separated control, nonlinear filtering) and suggests practical agent designs for non-Markovian settings, with future directions including RKHS-based distribution embeddings and extensions beyond stationarity.

Abstract

Motivated by the novel paradigm developed by Van Roy and coauthors for reinforcement learning in arbitrary non-Markovian environments, we propose a related formulation and explicitly pin down the error caused by non-Markovianity of observations when the Q-learning algorithm is applied on this formulation. Based on this observation, we propose that the criterion for agent design should be to seek good approximations for certain conditional laws. Inspired by classical stochastic control, we show that our problem reduces to that of recursive computation of approximate sufficient statistics. This leads to an autoencoder-based scheme for agent design which is then numerically tested on partially observed reinforcement learning environments.

Reinforcement Learning in Non-Markovian Environments

TL;DR

The paper investigates reinforcement learning in non-Markovian environments by explicitly analyzing the error introduced by non-Markovian observations in Q-learning and proposing a criterion to approximate certain conditional laws via recursively computable approximate sufficient statistics (RCASS). It develops an autoencoder-based computation scheme, yielding a Non-Markovian Q-Agent (NMQ) that integrates RCASS with a Deep Q-Network to handle partial observability; the approach is validated on partially observed tasks where standard DQN struggles. The work connects reinforcement learning with classical stochastic control concepts (separated control, nonlinear filtering) and suggests practical agent designs for non-Markovian settings, with future directions including RKHS-based distribution embeddings and extensions beyond stationarity.

Abstract

Motivated by the novel paradigm developed by Van Roy and coauthors for reinforcement learning in arbitrary non-Markovian environments, we propose a related formulation and explicitly pin down the error caused by non-Markovianity of observations when the Q-learning algorithm is applied on this formulation. Based on this observation, we propose that the criterion for agent design should be to seek good approximations for certain conditional laws. Inspired by classical stochastic control, we show that our problem reduces to that of recursive computation of approximate sufficient statistics. This leads to an autoencoder-based scheme for agent design which is then numerically tested on partially observed reinforcement learning environments.
Paper Structure (12 sections, 6 theorems, 68 equations, 2 figures)

This paper contains 12 sections, 6 theorems, 68 equations, 2 figures.

Key Result

Theorem 1

The iterates $\{Q_n\}$ from (Qlearn) converge a.s. to $Q^*$ where $Q^*$ is the unique solution to the system of equations

Figures (2)

  • Figure 1: Learning the agent state dynamics using autoencoders
  • Figure 2: Moving average of episodic reward for a single run of Non-Markovian Q-agent and the DQN-agent on the environments (a) cartpole, (b) mountain car and (c) Non-Markovian random walk. The dotted lines represent the reward obtained in each episode.

Theorems & Definitions (15)

  • Remark 1
  • Remark 2
  • Theorem 1
  • proof
  • Proposition 2
  • Theorem 3
  • proof
  • Theorem 4
  • proof : Proof Sketch
  • Remark 3
  • ...and 5 more