Table of Contents
Fetching ...

Zero-Shot Action Generalization with Limited Observations

Abdullah Alchihabi, Hanping Zhang, Yuhong Guo

TL;DR

This work tackles zero-shot action generalization in reinforcement learning under the constraint of limited action observations. It introduces AGLO, a two-module framework that first learns discriminative action embeddings from few observations via a coarse encoder, a graph-based refined encoder, and a hierarchical variational auto-encoder, and then trains a policy with augmented synthetic action representations produced by embedding-space mixup. The approach yields superior generalization to unseen actions across challenging CREATE tasks, outperforming prior zero-shot methods that require many observations or fine-tuning. The results demonstrate the practicality of zero-shot generalization with scarce data and highlight the importance of carefully designed representation learning and augmentation strategies for robust policy transfer.

Abstract

Reinforcement Learning (RL) has demonstrated remarkable success in solving sequential decision-making problems. However, in real-world scenarios, RL agents often struggle to generalize when faced with unseen actions that were not encountered during training. Some previous works on zero-shot action generalization rely on large datasets of action observations to capture the behaviors of new actions, making them impractical for real-world applications. In this paper, we introduce a novel zero-shot framework, Action Generalization from Limited Observations (AGLO). Our framework has two main components: an action representation learning module and a policy learning module. The action representation learning module extracts discriminative embeddings of actions from limited observations, while the policy learning module leverages the learned action representations, along with augmented synthetic action representations, to learn a policy capable of handling tasks with unseen actions. The experimental results demonstrate that our framework significantly outperforms state-of-the-art methods for zero-shot action generalization across multiple benchmark tasks, showcasing its effectiveness in generalizing to new actions with minimal action observations.

Zero-Shot Action Generalization with Limited Observations

TL;DR

This work tackles zero-shot action generalization in reinforcement learning under the constraint of limited action observations. It introduces AGLO, a two-module framework that first learns discriminative action embeddings from few observations via a coarse encoder, a graph-based refined encoder, and a hierarchical variational auto-encoder, and then trains a policy with augmented synthetic action representations produced by embedding-space mixup. The approach yields superior generalization to unseen actions across challenging CREATE tasks, outperforming prior zero-shot methods that require many observations or fine-tuning. The results demonstrate the practicality of zero-shot generalization with scarce data and highlight the importance of carefully designed representation learning and augmentation strategies for robust policy transfer.

Abstract

Reinforcement Learning (RL) has demonstrated remarkable success in solving sequential decision-making problems. However, in real-world scenarios, RL agents often struggle to generalize when faced with unseen actions that were not encountered during training. Some previous works on zero-shot action generalization rely on large datasets of action observations to capture the behaviors of new actions, making them impractical for real-world applications. In this paper, we introduce a novel zero-shot framework, Action Generalization from Limited Observations (AGLO). Our framework has two main components: an action representation learning module and a policy learning module. The action representation learning module extracts discriminative embeddings of actions from limited observations, while the policy learning module leverages the learned action representations, along with augmented synthetic action representations, to learn a policy capable of handling tasks with unseen actions. The experimental results demonstrate that our framework significantly outperforms state-of-the-art methods for zero-shot action generalization across multiple benchmark tasks, showcasing its effectiveness in generalizing to new actions with minimal action observations.

Paper Structure

This paper contains 25 sections, 14 equations, 2 figures, 2 tables, 1 algorithm.

Figures (2)

  • Figure 1: Estimated running time (in hours) for varying numbers of action observations in the CREATE environment. Previous studies utilized 45 observations per action jain2020generalization. (a) Running time estimate for generating action observations. (b) Running time estimate for action representation learning.
  • Figure 2: Policy learning analysis of VAE, HVAE and our AGLO on the Push task, with 5 and 7 observations per action, evaluated using the Target hit and reward metrics.