Contextual Pre-planning on Reward Machine Abstractions for Enhanced Transfer in Deep Reinforcement Learning
Guy Azran, Mohamad H. Danesh, Stefano V. Albrecht, Sarah Keren
TL;DR
This paper addresses rapid adaptation in deep reinforcement learning across related tasks by modeling task context with reward machines within contextual MDPs. It introduces Contextual PRE-Planning (C-PREP), which generates a context-specific RM, computes an RM-guided optimal path via a RM-augmented VI, and provides the next desired RM transition along with RM-based reward shaping to the agent. Empirical results across four grid-based domains show substantial improvements in few-shot and zero-shot transfer, especially in longer-horizon tasks, with notable gains in TT_AUC and JS and generally positive transfer ratios when using C-PREP. The approach demonstrates the value of symbolic task abstractions for transfer and outlines future work on RM generation, theoretical guarantees, and broader symbolic representations.
Abstract
Recent studies show that deep reinforcement learning (DRL) agents tend to overfit to the task on which they were trained and fail to adapt to minor environment changes. To expedite learning when transferring to unseen tasks, we propose a novel approach to representing the current task using reward machines (RMs), state machine abstractions that induce subtasks based on the current task's rewards and dynamics. Our method provides agents with symbolic representations of optimal transitions from their current abstract state and rewards them for achieving these transitions. These representations are shared across tasks, allowing agents to exploit knowledge of previously encountered symbols and transitions, thus enhancing transfer. Empirical results show that our representations improve sample efficiency and few-shot transfer in a variety of domains.
