Table of Contents
Fetching ...

Logic-Skill Programming: An Optimization-based Approach to Sequential Skill Planning

Teng Xue, Amirreza Razmjoo, Suhan Shetty, Sylvain Calinon

TL;DR

Logic-Skill Programming (LSP) tackles long-horizon manipulation by optimally sequencing independently learned, task-agnostic skills without relying on explicit symbolic goals. It casts sequencing as an extended first-order optimization that maximizes the sum of skill value functions plus a final geometric evaluation, approximating a global value-function space with Tensor Train representations and interleaving symbolic search (MCTS) with subgoal optimization (TTPI + CEM-MD). The approach yields multiple high-quality skill skeletons and subgoal sequences, demonstrated across non-prehensile, partly prehensile, and prehensile domains, including real-robot experiments, and shows superior value-function approximation compared with RL baselines. The work advances robust, goal-free long-horizon planning by combining learning-based policies with global optimization over a compact value-function space, enabling feasible and near-optimal execution despite contact uncertainty and disturbances.

Abstract

Recent advances in robot skill learning have unlocked the potential to construct task-agnostic skill libraries, facilitating the seamless sequencing of multiple simple manipulation primitives (aka. skills) to tackle significantly more complex tasks. Nevertheless, determining the optimal sequence for independently learned skills remains an open problem, particularly when the objective is given solely in terms of the final geometric configuration rather than a symbolic goal. To address this challenge, we propose Logic-Skill Programming (LSP), an optimization-based approach that sequences independently learned skills to solve long-horizon tasks. We formulate a first-order extension of a mathematical program to optimize the overall cumulative reward of all skills within a plan, abstracted by the sum of value functions. To solve such programs, we leverage the use of tensor train factorization to construct the value function space, and rely on alternations between symbolic search and skill value optimization to find the appropriate skill skeleton and optimal subgoal sequence. Experimental results indicate that the obtained value functions provide a superior approximation of cumulative rewards compared to state-of-the-art reinforcement learning methods. Furthermore, we validate LSP in three manipulation domains, encompassing both prehensile and non-prehensile primitives. The results demonstrate its capability to identify the optimal solution over the full logic and geometric path. The real-robot experiments showcase the effectiveness of our approach to cope with contact uncertainty and external disturbances in the real world.

Logic-Skill Programming: An Optimization-based Approach to Sequential Skill Planning

TL;DR

Logic-Skill Programming (LSP) tackles long-horizon manipulation by optimally sequencing independently learned, task-agnostic skills without relying on explicit symbolic goals. It casts sequencing as an extended first-order optimization that maximizes the sum of skill value functions plus a final geometric evaluation, approximating a global value-function space with Tensor Train representations and interleaving symbolic search (MCTS) with subgoal optimization (TTPI + CEM-MD). The approach yields multiple high-quality skill skeletons and subgoal sequences, demonstrated across non-prehensile, partly prehensile, and prehensile domains, including real-robot experiments, and shows superior value-function approximation compared with RL baselines. The work advances robust, goal-free long-horizon planning by combining learning-based policies with global optimization over a compact value-function space, enabling feasible and near-optimal execution despite contact uncertainty and disturbances.

Abstract

Recent advances in robot skill learning have unlocked the potential to construct task-agnostic skill libraries, facilitating the seamless sequencing of multiple simple manipulation primitives (aka. skills) to tackle significantly more complex tasks. Nevertheless, determining the optimal sequence for independently learned skills remains an open problem, particularly when the objective is given solely in terms of the final geometric configuration rather than a symbolic goal. To address this challenge, we propose Logic-Skill Programming (LSP), an optimization-based approach that sequences independently learned skills to solve long-horizon tasks. We formulate a first-order extension of a mathematical program to optimize the overall cumulative reward of all skills within a plan, abstracted by the sum of value functions. To solve such programs, we leverage the use of tensor train factorization to construct the value function space, and rely on alternations between symbolic search and skill value optimization to find the appropriate skill skeleton and optimal subgoal sequence. Experimental results indicate that the obtained value functions provide a superior approximation of cumulative rewards compared to state-of-the-art reinforcement learning methods. Furthermore, we validate LSP in three manipulation domains, encompassing both prehensile and non-prehensile primitives. The results demonstrate its capability to identify the optimal solution over the full logic and geometric path. The real-robot experiments showcase the effectiveness of our approach to cope with contact uncertainty and external disturbances in the real world.
Paper Structure (19 sections, 14 equations, 5 figures, 5 tables, 2 algorithms)

This paper contains 19 sections, 14 equations, 5 figures, 5 tables, 2 algorithms.

Figures (5)

  • Figure 1: Overview of the proposed approach: Given the evaluation function $\Psi$ of the final configuration, along with the initial symbolic state $s_0$ and geometric state $\overline{\bm{x}}_0$, the objective of LSP is to find a solution that can accomplish the task with minimal control costs. A task-agnostic skill library is pretrained, consisting of $N$ skill operators $\mathcal{A} = \{a_{1:N}\}$, along with corresponding value functions $\mathcal{V} = \{V^{\pi_{1:N}}\}$ and policies $\mathcal{P}=\{\pi_{1:N}\}$ in Tensor Train format. LSP solves this problem by alternating between symbolic search and skill value optimization for joint logic-geometric reasoning. Symbols $s_{1:K}$ are used as constraints for skill optimization, while skill optimization is used to check skeleton feasibility and final configuration performance, with a feedback reward $\overline{r}$ informing the symbolic search. This results in the appropriate skill skeleton $a_{1:K}$ and subgoal sequence $\overline{\bm{x}}_{1_T:K_T}$, which are then combined with the skill policies $\pi_{1:K}$ existing in the skill library to actuate the real robot. Notably, the gray channel with symbolic final state $s_K$ is interrupted because our framework eliminates the need for a symbolic target goal $s_T$, while such information is typically required in existing sampling-based sequential skill planning methods.
  • Figure 2: Three sequential manipulation domains, including both prehensile and non-prehensile manipulation primitives. The transparent object represents the final target configuration in each domain.
  • Figure 3: The action skeletons obtained by LSP for three domains. The dot point denotes the start of the skill sequence. Each color represents one solution, with black lines indicating the common shared tunnel. The red star illustrates the end of the skill skeleton.
  • Figure 4: The pushing subtask obtained for the PPM domain. Both \ref{['sub:optimal']} and \ref{['sub:feasible']} have the same initialization configuration. The objective is to employ the robot end effector (pusher) to move the slider to the green line ($x=0$), representing the edge of the table. LSP provides a solution with the highest value in the space, requiring less control effort, while STAP outputs one that involves multiple face switches.
  • Figure 5: Non-prehensile manipulation domain task. The system is initialized as (a), and the objective is to manipulate the box to achieve the target configuration as (g). The first stage involves pushing the box towards the wall with a $90^\circ$ rotation. Additionally, we apply an external disturbance to test the skill policy (c). After the pushing stage, the robot switches to the pivoting skill (d, e), followed by pulling (f), until reaching the final geometric configuration.