Table of Contents
Fetching ...

TEA: Trajectory Encoding Augmentation for Robust and Transferable Policies in Offline Reinforcement Learning

Batıkan Bora Ormancı, Phillip Swazinna, Steffen Udluft, Thomas A. Runkler

TL;DR

This paper tackles offline reinforcement learning under unseen dynamic variations. It proposes Trajectory Encoding Augmentation (TEA), which learns a $4$-D latent representation of environment dynamics from sequences of length $16$ state-action pairs and appends it to the observed state. Using BCQ in the offline setting, TEA is evaluated on CartPole variants with varying pole length and cart mass, showing consistent transfer to ten unseen environments after training on source environments. The results indicate that latent trajectory encodings capture critical environment-specific information, enabling a single policy to generalize across dynamic conditions and suggesting fruitful directions for offline-to-online extensions.

Abstract

In this paper, we investigate offline reinforcement learning (RL) with the goal of training a single robust policy that generalizes effectively across environments with unseen dynamics. We propose a novel approach, Trajectory Encoding Augmentation (TEA), which extends the state space by integrating latent representations of environmental dynamics obtained from sequence encoders, such as AutoEncoders. Our findings show that incorporating these encodings with TEA improves the transferability of a single policy to novel environments with new dynamics, surpassing methods that rely solely on unmodified states. These results indicate that TEA captures critical, environment-specific characteristics, enabling RL agents to generalize effectively across dynamic conditions.

TEA: Trajectory Encoding Augmentation for Robust and Transferable Policies in Offline Reinforcement Learning

TL;DR

This paper tackles offline reinforcement learning under unseen dynamic variations. It proposes Trajectory Encoding Augmentation (TEA), which learns a -D latent representation of environment dynamics from sequences of length state-action pairs and appends it to the observed state. Using BCQ in the offline setting, TEA is evaluated on CartPole variants with varying pole length and cart mass, showing consistent transfer to ten unseen environments after training on source environments. The results indicate that latent trajectory encodings capture critical environment-specific information, enabling a single policy to generalize across dynamic conditions and suggesting fruitful directions for offline-to-online extensions.

Abstract

In this paper, we investigate offline reinforcement learning (RL) with the goal of training a single robust policy that generalizes effectively across environments with unseen dynamics. We propose a novel approach, Trajectory Encoding Augmentation (TEA), which extends the state space by integrating latent representations of environmental dynamics obtained from sequence encoders, such as AutoEncoders. Our findings show that incorporating these encodings with TEA improves the transferability of a single policy to novel environments with new dynamics, surpassing methods that rely solely on unmodified states. These results indicate that TEA captures critical, environment-specific characteristics, enabling RL agents to generalize effectively across dynamic conditions.

Paper Structure

This paper contains 4 sections, 1 equation, 3 figures, 1 table.

Figures (3)

  • Figure 1: Illustration of cartpoles with varying pole lengths and cart masses.
  • Figure 2: Scatter plot showing pole length and cart mass for source environments (blue) and new environments (red).
  • Figure 3: Plot illustrating the performance improvement achieved through TEA across various new environments, compared to the baseline performance. Notably, all performance ratios exceed 1, demonstrating consistent gains.