Skills Regularized Task Decomposition for Multi-task Offline Reinforcement Learning

Minjong Yoo; Sangwoo Cho; Honguk Woo

Skills Regularized Task Decomposition for Multi-task Offline Reinforcement Learning

Minjong Yoo, Sangwoo Cho, Honguk Woo

TL;DR

This work addresses offline multi-task RL under heterogeneous data quality by introducing Skill Regularized Task Decomposition (SRTD). It jointly learns skill embeddings and task embeddings in a shared latent space using a Wasserstein auto-encoder, with quality-aware regularization that biases subtask representations toward high-quality trajectories. A data augmentation scheme of imaginary demonstrations, generated via a coupled skill- and task-decoder, enhances offline learning without environment interaction. Empirical results on MT10 robotic tasks and Airsim drone navigation show that SRTD, especially with imaginary demonstrations (SRTD+ID), achieves robust performance across mixed-quality datasets and outperforms state-of-the-art baselines, validating the approach for data-efficient, multi-task offline RL in robotics.

Abstract

Reinforcement learning (RL) with diverse offline datasets can have the advantage of leveraging the relation of multiple tasks and the common skills learned across those tasks, hence allowing us to deal with real-world complex problems efficiently in a data-driven way. In offline RL where only offline data is used and online interaction with the environment is restricted, it is yet difficult to achieve the optimal policy for multiple tasks, especially when the data quality varies for the tasks. In this paper, we present a skill-based multi-task RL technique on heterogeneous datasets that are generated by behavior policies of different quality. To learn the shareable knowledge across those datasets effectively, we employ a task decomposition method for which common skills are jointly learned and used as guidance to reformulate a task in shared and achievable subtasks. In this joint learning, we use Wasserstein auto-encoder (WAE) to represent both skills and tasks on the same latent space and use the quality-weighted loss as a regularization term to induce tasks to be decomposed into subtasks that are more consistent with high-quality skills than others. To improve the performance of offline RL agents learned on the latent space, we also augment datasets with imaginary trajectories relevant to high-quality skills for each task. Through experiments, we show that our multi-task offline RL approach is robust to the mixed configurations of different-quality datasets and it outperforms other state-of-the-art algorithms for several robotic manipulation tasks and drone navigation tasks.

Skills Regularized Task Decomposition for Multi-task Offline Reinforcement Learning

TL;DR

Abstract

Paper Structure (14 sections, 9 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 14 sections, 9 equations, 6 figures, 2 tables, 1 algorithm.

Introduction
Overall Approach
Preliminary
Overall Approach For Multi-task Offline RL
Task Decomposition with Quality-aware Skill Regularization
Learning Skill Embeddings
Skill-regularized Task Decomposition
Data Augmentation by Imaginary Demonstrations
Experiments
Meta-world Tests
A Case Study for Airsim-based Drone Navigation
Related Work
Conclusion
Acknowledgement

Figures (6)

Figure 1: Our proposed multi-task offline RL model consisting of (a) task decomposition and (b) data augmentation. In (a), sub-trajectories from static datasets are converted into skill embeddings and task embeddings on the same latent space, which together enable the decomposition of tasks into achievable subtasks. The blue-colored dots denote task embeddings that model the environment, and the green-colored dots denote skill embeddings. In the green-colored dotted circle, a sub-trajectory $\tau_1$ of task $1$ is embedded as $z_1$ and then located as $z'_1$ closer to its corresponding high-quality skill $b_1$ (the action sequence of the sub-trajectory $\tau_1$ with large returns), while in the red-colored dotted circle, another sub-trajectory $\tau_N$ of task $N$ is embedded as $z_N$ and located as $z'_N$ further from its corresponding low-quality skill $b_N$ (the action sequence of the sub-trajectory $\tau_N$ with small returns). In (b), for training offline RL agents, imaginary trajectories similar to expert demonstrations are sampled from the latent space and added to the datasets.
Figure 2: Task decomposition procedure with quality-aware skill regularization. In the right side of the figure, the red arrow denotes $L_{PR}$ in \ref{['loss_pr']} that makes low-quality sub-trajectories stretch within the prior distribution of tasks (in gray), and the blue arrow denotes $L_{SR}$ in \ref{['sr_loss']} that makes high-quality sub-trajectories shrink around the distribution of skills (in blue).
Figure 3: Data augmentation by imaginary demonstrations
Figure 4: Examples of state-action pair distribution. (c) The imaginary demonstrations generated from (a) the source dataset look more similar to (b) the expert dataset than (a) the source dataset, while (d) the augmented dataset by Gaussian noise does not. The table lists the average rewards calculated by reward relabeling on the datasets in (a)-(d), respectively, illustrating the quality gain of (c) compared to (d) in terms of average rewards.
Figure 5: Effects of (a) quality-aware skill regularization and (b) imaginary demonstrations
...and 1 more figures

Skills Regularized Task Decomposition for Multi-task Offline Reinforcement Learning

TL;DR

Abstract

Skills Regularized Task Decomposition for Multi-task Offline Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)