Offline Policy Learning via Skill-step Abstraction for Long-horizon Goal-Conditioned Tasks

Donghoon Kim; Minjong Yoo; Honguk Woo

Offline Policy Learning via Skill-step Abstraction for Long-horizon Goal-Conditioned Tasks

Donghoon Kim, Minjong Yoo, Honguk Woo

TL;DR

This paper addresses the challenge of reward sparsity in long-horizon goal-conditioned policy learning by introducing GLvSA, an offline framework that learns a skill-step model and a GC policy in tandem. By encoding temporally abstracted skills and environment dynamics into a latent space and using model-guided rollouts, GLvSA expands the set of achievable goals and improves adaptation to shifted goal distributions. A modular policy hierarchy enables near-term skill-step goal generation and parameter-efficient online fine-tuning, enabling strong zero-shot and few-shot performance in maze navigation and Franka kitchen tasks. The results demonstrate that offline skill-step abstraction can outperform existing GC policy learning and skill-based RL methods, offering practical benefits for scalable, data-efficient long-horizon control.

Abstract

Goal-conditioned (GC) policy learning often faces a challenge arising from the sparsity of rewards, when confronting long-horizon goals. To address the challenge, we explore skill-based GC policy learning in offline settings, where skills are acquired from existing data and long-horizon goals are decomposed into sequences of near-term goals that align with these skills. Specifically, we present an `offline GC policy learning via skill-step abstraction' framework (GLvSA) tailored for tackling long-horizon GC tasks affected by goal distribution shifts. In the framework, a GC policy is progressively learned offline in conjunction with the incremental modeling of skill-step abstractions on the data. We also devise a GC policy hierarchy that not only accelerates GC policy learning within the framework but also allows for parameter-efficient fine-tuning of the policy. Through experiments with the maze and Franka kitchen environments, we demonstrate the superiority and efficiency of our GLvSA framework in adapting GC policies to a wide range of long-horizon goals. The framework achieves competitive zero-shot and few-shot adaptation performance, outperforming existing GC policy learning and skill-based methods.

Offline Policy Learning via Skill-step Abstraction for Long-horizon Goal-Conditioned Tasks

TL;DR

Abstract

Offline Policy Learning via Skill-step Abstraction for Long-horizon Goal-Conditioned Tasks

Authors

TL;DR

Abstract

Table of Contents

Figures (4)