TEA: Trajectory Encoding Augmentation for Robust and Transferable Policies in Offline Reinforcement Learning
Batıkan Bora Ormancı, Phillip Swazinna, Steffen Udluft, Thomas A. Runkler
TL;DR
This paper tackles offline reinforcement learning under unseen dynamic variations. It proposes Trajectory Encoding Augmentation (TEA), which learns a $4$-D latent representation of environment dynamics from sequences of length $16$ state-action pairs and appends it to the observed state. Using BCQ in the offline setting, TEA is evaluated on CartPole variants with varying pole length and cart mass, showing consistent transfer to ten unseen environments after training on source environments. The results indicate that latent trajectory encodings capture critical environment-specific information, enabling a single policy to generalize across dynamic conditions and suggesting fruitful directions for offline-to-online extensions.
Abstract
In this paper, we investigate offline reinforcement learning (RL) with the goal of training a single robust policy that generalizes effectively across environments with unseen dynamics. We propose a novel approach, Trajectory Encoding Augmentation (TEA), which extends the state space by integrating latent representations of environmental dynamics obtained from sequence encoders, such as AutoEncoders. Our findings show that incorporating these encodings with TEA improves the transferability of a single policy to novel environments with new dynamics, surpassing methods that rely solely on unmodified states. These results indicate that TEA captures critical, environment-specific characteristics, enabling RL agents to generalize effectively across dynamic conditions.
