Table of Contents
Fetching ...

Curricula for Learning Robust Policies with Factored State Representations in Changing Environments

Panayiotis Panayiotou, Özgür Şimşek

TL;DR

This paper experimentally demonstrates three simple curricula, such as varying only the variable of highest regret between episodes, that can significantly enhance policy robustness, offering practical insights for reinforcement learning in complex environments.

Abstract

Robust policies enable reinforcement learning agents to effectively adapt to and operate in unpredictable, dynamic, and ever-changing real-world environments. Factored representations, which break down complex state and action spaces into distinct components, can improve generalization and sample efficiency in policy learning. In this paper, we explore how the curriculum of an agent using a factored state representation affects the robustness of the learned policy. We experimentally demonstrate three simple curricula, such as varying only the variable of highest regret between episodes, that can significantly enhance policy robustness, offering practical insights for reinforcement learning in complex environments.

Curricula for Learning Robust Policies with Factored State Representations in Changing Environments

TL;DR

This paper experimentally demonstrates three simple curricula, such as varying only the variable of highest regret between episodes, that can significantly enhance policy robustness, offering practical insights for reinforcement learning in complex environments.

Abstract

Robust policies enable reinforcement learning agents to effectively adapt to and operate in unpredictable, dynamic, and ever-changing real-world environments. Factored representations, which break down complex state and action spaces into distinct components, can improve generalization and sample efficiency in policy learning. In this paper, we explore how the curriculum of an agent using a factored state representation affects the robustness of the learned policy. We experimentally demonstrate three simple curricula, such as varying only the variable of highest regret between episodes, that can significantly enhance policy robustness, offering practical insights for reinforcement learning in complex environments.
Paper Structure (24 sections, 3 equations, 3 figures)

This paper contains 24 sections, 3 equations, 3 figures.

Figures (3)

  • Figure 1: Quick Chess subgames, increasing in complexity from left to right (image source: narvekar2016source).
  • Figure 2: A sample Frozen Lake environment. On the right hand side, we present a factored representation of the state. Using this factored representation, the transition function can be factorised using a Dynamic Bayesian Network (see Figure \ref{['fig:sfl-dbn']} in Appendix \ref{['appendix:a']}).
  • Figure 3: A Dynamic Bayesian Network for the factored MDP of the Shifting Frozen Lake. The distance matrix (from the goal location), the grid size, the goal location and the hole locations are constant throughout each episode.