Logic-Skill Programming: An Optimization-based Approach to Sequential Skill Planning
Teng Xue, Amirreza Razmjoo, Suhan Shetty, Sylvain Calinon
TL;DR
Logic-Skill Programming (LSP) tackles long-horizon manipulation by optimally sequencing independently learned, task-agnostic skills without relying on explicit symbolic goals. It casts sequencing as an extended first-order optimization that maximizes the sum of skill value functions plus a final geometric evaluation, approximating a global value-function space with Tensor Train representations and interleaving symbolic search (MCTS) with subgoal optimization (TTPI + CEM-MD). The approach yields multiple high-quality skill skeletons and subgoal sequences, demonstrated across non-prehensile, partly prehensile, and prehensile domains, including real-robot experiments, and shows superior value-function approximation compared with RL baselines. The work advances robust, goal-free long-horizon planning by combining learning-based policies with global optimization over a compact value-function space, enabling feasible and near-optimal execution despite contact uncertainty and disturbances.
Abstract
Recent advances in robot skill learning have unlocked the potential to construct task-agnostic skill libraries, facilitating the seamless sequencing of multiple simple manipulation primitives (aka. skills) to tackle significantly more complex tasks. Nevertheless, determining the optimal sequence for independently learned skills remains an open problem, particularly when the objective is given solely in terms of the final geometric configuration rather than a symbolic goal. To address this challenge, we propose Logic-Skill Programming (LSP), an optimization-based approach that sequences independently learned skills to solve long-horizon tasks. We formulate a first-order extension of a mathematical program to optimize the overall cumulative reward of all skills within a plan, abstracted by the sum of value functions. To solve such programs, we leverage the use of tensor train factorization to construct the value function space, and rely on alternations between symbolic search and skill value optimization to find the appropriate skill skeleton and optimal subgoal sequence. Experimental results indicate that the obtained value functions provide a superior approximation of cumulative rewards compared to state-of-the-art reinforcement learning methods. Furthermore, we validate LSP in three manipulation domains, encompassing both prehensile and non-prehensile primitives. The results demonstrate its capability to identify the optimal solution over the full logic and geometric path. The real-robot experiments showcase the effectiveness of our approach to cope with contact uncertainty and external disturbances in the real world.
