Table of Contents
Fetching ...

Exploiting Hybrid Policy in Reinforcement Learning for Interpretable Temporal Logic Manipulation

Hao Zhang, Hao Wang, Xiucai Huang, Wenrui Chen, Zhen Kan

TL;DR

This work tackles the sample inefficiency and limited task-semantic guidance of traditional RL in long-horizon robotic manipulation by introducing HyTL, a Temporal-Logic-guided Hybrid policy. HyTL combines a high-level waypoint planning module, a middle-level primitive selection, and a low-level parameterization policy, all conditioned on an LTL representation encoded by a Transformer and optimized within a SAC framework. The approach emphasizes interpretability through AttCAT, enabling insight into how LTL tokens influence decisions. Empirical results across four challenging tasks demonstrate improved sampling efficiency, faster convergence, and enhanced interpretability compared to baselines, with notable gains on the most demanding Peg Insertion task.

Abstract

Reinforcement Learning (RL) based methods have been increasingly explored for robot learning. However, RL based methods often suffer from low sampling efficiency in the exploration phase, especially for long-horizon manipulation tasks, and generally neglect the semantic information from the task level, resulted in a delayed convergence or even tasks failure. To tackle these challenges, we propose a Temporal-Logic-guided Hybrid policy framework (HyTL) which leverages three-level decision layers to improve the agent's performance. Specifically, the task specifications are encoded via linear temporal logic (LTL) to improve performance and offer interpretability. And a waypoints planning module is designed with the feedback from the LTL-encoded task level as a high-level policy to improve the exploration efficiency. The middle-level policy selects which behavior primitives to execute, and the low-level policy specifies the corresponding parameters to interact with the environment. We evaluate HyTL on four challenging manipulation tasks, which demonstrate its effectiveness and interpretability. Our project is available at: https://sites.google.com/view/hytl-0257/.

Exploiting Hybrid Policy in Reinforcement Learning for Interpretable Temporal Logic Manipulation

TL;DR

This work tackles the sample inefficiency and limited task-semantic guidance of traditional RL in long-horizon robotic manipulation by introducing HyTL, a Temporal-Logic-guided Hybrid policy. HyTL combines a high-level waypoint planning module, a middle-level primitive selection, and a low-level parameterization policy, all conditioned on an LTL representation encoded by a Transformer and optimized within a SAC framework. The approach emphasizes interpretability through AttCAT, enabling insight into how LTL tokens influence decisions. Empirical results across four challenging tasks demonstrate improved sampling efficiency, faster convergence, and enhanced interpretability compared to baselines, with notable gains on the most demanding Peg Insertion task.

Abstract

Reinforcement Learning (RL) based methods have been increasingly explored for robot learning. However, RL based methods often suffer from low sampling efficiency in the exploration phase, especially for long-horizon manipulation tasks, and generally neglect the semantic information from the task level, resulted in a delayed convergence or even tasks failure. To tackle these challenges, we propose a Temporal-Logic-guided Hybrid policy framework (HyTL) which leverages three-level decision layers to improve the agent's performance. Specifically, the task specifications are encoded via linear temporal logic (LTL) to improve performance and offer interpretability. And a waypoints planning module is designed with the feedback from the LTL-encoded task level as a high-level policy to improve the exploration efficiency. The middle-level policy selects which behavior primitives to execute, and the low-level policy specifies the corresponding parameters to interact with the environment. We evaluate HyTL on four challenging manipulation tasks, which demonstrate its effectiveness and interpretability. Our project is available at: https://sites.google.com/view/hytl-0257/.
Paper Structure (13 sections, 5 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 13 sections, 5 equations, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1: The waypoints planning module updated via interactions.
  • Figure 2: The framework of HyTL. (a) The outline of Task Representation Module. (b) The LTL progression for progressing LTL formulas. (c) The hybrid decision-making process.
  • Figure 3: Plots of normalized reward curves for four manipulations.
  • Figure 4: The visualization of action sketches utilizing HyTL.
  • Figure 5: We illustrate the heatmap of the task $\varphi_{\mathsf{cleanup}}$ by normalizing impact scores from different Transformer layers.

Theorems & Definitions (1)

  • Example 1