Zero-Shot Action Generalization with Limited Observations
Abdullah Alchihabi, Hanping Zhang, Yuhong Guo
TL;DR
This work tackles zero-shot action generalization in reinforcement learning under the constraint of limited action observations. It introduces AGLO, a two-module framework that first learns discriminative action embeddings from few observations via a coarse encoder, a graph-based refined encoder, and a hierarchical variational auto-encoder, and then trains a policy with augmented synthetic action representations produced by embedding-space mixup. The approach yields superior generalization to unseen actions across challenging CREATE tasks, outperforming prior zero-shot methods that require many observations or fine-tuning. The results demonstrate the practicality of zero-shot generalization with scarce data and highlight the importance of carefully designed representation learning and augmentation strategies for robust policy transfer.
Abstract
Reinforcement Learning (RL) has demonstrated remarkable success in solving sequential decision-making problems. However, in real-world scenarios, RL agents often struggle to generalize when faced with unseen actions that were not encountered during training. Some previous works on zero-shot action generalization rely on large datasets of action observations to capture the behaviors of new actions, making them impractical for real-world applications. In this paper, we introduce a novel zero-shot framework, Action Generalization from Limited Observations (AGLO). Our framework has two main components: an action representation learning module and a policy learning module. The action representation learning module extracts discriminative embeddings of actions from limited observations, while the policy learning module leverages the learned action representations, along with augmented synthetic action representations, to learn a policy capable of handling tasks with unseen actions. The experimental results demonstrate that our framework significantly outperforms state-of-the-art methods for zero-shot action generalization across multiple benchmark tasks, showcasing its effectiveness in generalizing to new actions with minimal action observations.
