Table of Contents
Fetching ...

Diffusion Meets Options: Hierarchical Generative Skill Composition for Temporally-Extended Tasks

Zeyu Feng, Hao Luan, Kevin Yuchen Ma, Harold Soh

TL;DR

The paper tackles long-horizon trajectory planning with temporally extended goals specified by $LTL$ in offline settings. It introduces Doppler, an offline hierarchical RL framework that models options with diffusion-based policies and employs a product MDP $M_{\Psi}$ to capture non-Markovian LTL rewards. A key contribution is diversity-guided sampling, inspired by determinantal point processes, to generate a rich but dataset-supported set of options within the offline distribution. Empirical results in simulation and real-world robots show Doppler achieves higher $LTL$ satisfaction and robustness to perturbations compared to baselines, illustrating the practicality of closed-loop, temporally-aware planning from offline data.

Abstract

Safe and successful deployment of robots requires not only the ability to generate complex plans but also the capacity to frequently replan and correct execution errors. This paper addresses the challenge of long-horizon trajectory planning under temporally extended objectives in a receding horizon manner. To this end, we propose DOPPLER, a data-driven hierarchical framework that generates and updates plans based on instruction specified by linear temporal logic (LTL). Our method decomposes temporal tasks into chain of options with hierarchical reinforcement learning from offline non-expert datasets. It leverages diffusion models to generate options with low-level actions. We devise a determinantal-guided posterior sampling technique during batch generation, which improves the speed and diversity of diffusion generated options, leading to more efficient querying. Experiments on robot navigation and manipulation tasks demonstrate that DOPPLER can generate sequences of trajectories that progressively satisfy the specified formulae for obstacle avoidance and sequential visitation. Demonstration videos are available online at: https://philiptheother.github.io/doppler/.

Diffusion Meets Options: Hierarchical Generative Skill Composition for Temporally-Extended Tasks

TL;DR

The paper tackles long-horizon trajectory planning with temporally extended goals specified by in offline settings. It introduces Doppler, an offline hierarchical RL framework that models options with diffusion-based policies and employs a product MDP to capture non-Markovian LTL rewards. A key contribution is diversity-guided sampling, inspired by determinantal point processes, to generate a rich but dataset-supported set of options within the offline distribution. Empirical results in simulation and real-world robots show Doppler achieves higher satisfaction and robustness to perturbations compared to baselines, illustrating the practicality of closed-loop, temporally-aware planning from offline data.

Abstract

Safe and successful deployment of robots requires not only the ability to generate complex plans but also the capacity to frequently replan and correct execution errors. This paper addresses the challenge of long-horizon trajectory planning under temporally extended objectives in a receding horizon manner. To this end, we propose DOPPLER, a data-driven hierarchical framework that generates and updates plans based on instruction specified by linear temporal logic (LTL). Our method decomposes temporal tasks into chain of options with hierarchical reinforcement learning from offline non-expert datasets. It leverages diffusion models to generate options with low-level actions. We devise a determinantal-guided posterior sampling technique during batch generation, which improves the speed and diversity of diffusion generated options, leading to more efficient querying. Experiments on robot navigation and manipulation tasks demonstrate that DOPPLER can generate sequences of trajectories that progressively satisfy the specified formulae for obstacle avoidance and sequential visitation. Demonstration videos are available online at: https://philiptheother.github.io/doppler/.
Paper Structure (13 sections, 7 equations, 5 figures, 4 tables, 2 algorithms)

This paper contains 13 sections, 7 equations, 5 figures, 4 tables, 2 algorithms.

Figures (5)

  • Figure 1: Overview of Doppler framework. The model is trained using non-expert trajectory data and LTL specifications. Trajectories are sampled from the diffusion model to generate options, which are then used to form transition tuples that include the state ($s$), option ($o$), and LTL formula ($\varphi$). These tuples are processed with LTL progression to produce the next state ($s'$), next option ($o'$), updated LTL formula ($\varphi'$), and reward ($r$). The hierarchical RL loss is then computed to guide the learning process.
  • Figure 2: Setup and sample trajectories for Maze2D environments. (a) and (d) depict Maze2D Medium and Large, each containing six non-overlapping regions (hatched squares labeled with $p_x$) used to evaluate atomic propositions in $\mathcal{P}$. The agent is tasked with visiting specific regions in different temporally extended sequences. (b) and (e) illustrate the trajectories generated by LTLDoG and our proposed method (Doppler) (from blue to red) under the specification $\varphi = \neg p_3 \operatorname{\mathsf{U}}\xspace{} (p_0 \wedge (\neg p_1 \operatorname{\mathsf{U}}\xspace{} p_4))$. (c) and (f) show trajectories generated under the specification $\varphi = \neg p_0 \operatorname{\mathsf{U}}\xspace{} (p_3 \wedge (\neg p_4 \operatorname{\mathsf{U}}\xspace{} p_5))$. Our method (Doppler) successfully satisfies these complex LTL specifications by avoiding regions with $\neg$ propositions (red zones) before reaching the designated green regions, as demonstrated in panels (e) and (f).
  • Figure 3: The PushT manipulation environment. (a) A robot arm's end effector (circles filled in blue) should manipulate the T block (gray) to a goal pose (green) and visit some regions (hollow circles marked with $p_x$) under different temporally-extended orders before completion. (b) LTLDoG-R does not comply with the LTL nor completes the manipulation. (c) In contrast, Doppler can satisfy the LTL and complete the manipulation task.
  • Figure 4: Real world environments for quadruped robot navigation.
  • Figure 5: Results in real-world rooms. Each room has 4 key locations ((a) and (d)). The instructed LTL is $\neg \text{Table} \operatorname{\mathsf{U}}\xspace{} (\text{Screen} \wedge (\neg \text{Kitchen} \operatorname{\mathsf{U}}\xspace{} \text{Door}))$ for lab (first row) and $\neg \text{Door} \operatorname{\mathsf{U}}\xspace{} (\text{Corridor} \wedge (\neg \text{Table} \operatorname{\mathsf{U}}\xspace{} \text{Seat}))$ for office (second row). LTLDoG is unable to recover from perturbations (b) and cannot generate a valid plan (e), while ours ((c) and (f)) can achieve both.