Curricula for Learning Robust Policies with Factored State Representations in Changing Environments

Panayiotis Panayiotou; Özgür Şimşek

Curricula for Learning Robust Policies with Factored State Representations in Changing Environments

Panayiotis Panayiotou, Özgür Şimşek

TL;DR

This paper experimentally demonstrates three simple curricula, such as varying only the variable of highest regret between episodes, that can significantly enhance policy robustness, offering practical insights for reinforcement learning in complex environments.

Abstract

Robust policies enable reinforcement learning agents to effectively adapt to and operate in unpredictable, dynamic, and ever-changing real-world environments. Factored representations, which break down complex state and action spaces into distinct components, can improve generalization and sample efficiency in policy learning. In this paper, we explore how the curriculum of an agent using a factored state representation affects the robustness of the learned policy. We experimentally demonstrate three simple curricula, such as varying only the variable of highest regret between episodes, that can significantly enhance policy robustness, offering practical insights for reinforcement learning in complex environments.

Curricula for Learning Robust Policies with Factored State Representations in Changing Environments

TL;DR

Abstract

Paper Structure (24 sections, 3 equations, 3 figures)

This paper contains 24 sections, 3 equations, 3 figures.

Introduction
Preliminaries
Markov Decision Processes.
Reinforcement Learning.
Dynamic Bayesian Networks.
Factored Representations.
Factorisation of MDPs.
Distribution Shifts.
Low-Regret Policies.
Background
The Shifting Frozen Lake
Environment shifts.
Experiments
Curriculum (A): No Shifting to Random Shifting.
Curriculum (B): No Shifting to Single Random Variable Shifting.
...and 9 more sections

Figures (3)

Figure 1: Quick Chess subgames, increasing in complexity from left to right (image source: narvekar2016source).
Figure 2: A sample Frozen Lake environment. On the right hand side, we present a factored representation of the state. Using this factored representation, the transition function can be factorised using a Dynamic Bayesian Network (see Figure \ref{['fig:sfl-dbn']} in Appendix \ref{['appendix:a']}).
Figure 3: A Dynamic Bayesian Network for the factored MDP of the Shifting Frozen Lake. The distance matrix (from the goal location), the grid size, the goal location and the hole locations are constant throughout each episode.

Curricula for Learning Robust Policies with Factored State Representations in Changing Environments

TL;DR

Abstract

Curricula for Learning Robust Policies with Factored State Representations in Changing Environments

Authors

TL;DR

Abstract

Table of Contents

Figures (3)