Table of Contents
Fetching ...

Fast and Accurate Task Planning using Neuro-Symbolic Language Models and Multi-level Goal Decomposition

Minseo Kwon, Yaesol Kim, Young J. Kim

TL;DR

This work addresses the bottleneck of long-horizon robotic task planning by introducing a neuro-symbolic planner that uses LLMs both as L-Model (for multi-level subgoal decomposition) and L-Policy (to drive subgoal planning via Monte Carlo Tree Search). Planning is formulated as a multi-valued planning task $P \equiv \langle \mathcal{S}, \mathcal{O}, \mathcal{A}, \mathcal{T}, s_0, S^\star \rangle$, with subgoals generating subproblems $P_i$ that are solved either symbolically or with MCTS-LLM depending on complexity. The pipeline combines planning formulation from multimodal perception, subgoal generation with LLMs, and a selective subproblem solver (symbolic or MCTS-LLM), ultimately concatenating subplans into a complete PDDL-based plan. Across three IPC domains and real/simulation tests, the approach yields high success rates (up to $100\%$) and demonstrates favorable scalability and robustness, supported by ablations showing the effectiveness of goal decomposition and by real-robot demonstrations. The work advances scalable, robust task planning for robotics by leveraging subgoal decomposition and a hybrid search strategy that fuses symbolic and LLM-guided planning.

Abstract

In robotic task planning, symbolic planners using rule-based representations like PDDL are effective but struggle with long-sequential tasks in complicated environments due to exponentially increasing search space. Meanwhile, LLM-based approaches, which are grounded in artificial neural networks, offer faster inference and commonsense reasoning but suffer from lower success rates. To address the limitations of the current symbolic (slow speed) or LLM-based approaches (low accuracy), we propose a novel neuro-symbolic task planner that decomposes complex tasks into subgoals using LLM and carries out task planning for each subgoal using either symbolic or MCTS-based LLM planners, depending on the subgoal complexity. This decomposition reduces planning time and improves success rates by narrowing the search space and enabling LLMs to focus on more manageable tasks. Our method significantly reduces planning time while maintaining high success rates across task planning domains, as well as real-world and simulated robotics environments. More details are available at http://graphics.ewha.ac.kr/LLMTAMP/.

Fast and Accurate Task Planning using Neuro-Symbolic Language Models and Multi-level Goal Decomposition

TL;DR

This work addresses the bottleneck of long-horizon robotic task planning by introducing a neuro-symbolic planner that uses LLMs both as L-Model (for multi-level subgoal decomposition) and L-Policy (to drive subgoal planning via Monte Carlo Tree Search). Planning is formulated as a multi-valued planning task , with subgoals generating subproblems that are solved either symbolically or with MCTS-LLM depending on complexity. The pipeline combines planning formulation from multimodal perception, subgoal generation with LLMs, and a selective subproblem solver (symbolic or MCTS-LLM), ultimately concatenating subplans into a complete PDDL-based plan. Across three IPC domains and real/simulation tests, the approach yields high success rates (up to ) and demonstrates favorable scalability and robustness, supported by ablations showing the effectiveness of goal decomposition and by real-robot demonstrations. The work advances scalable, robust task planning for robotics by leveraging subgoal decomposition and a hybrid search strategy that fuses symbolic and LLM-guided planning.

Abstract

In robotic task planning, symbolic planners using rule-based representations like PDDL are effective but struggle with long-sequential tasks in complicated environments due to exponentially increasing search space. Meanwhile, LLM-based approaches, which are grounded in artificial neural networks, offer faster inference and commonsense reasoning but suffer from lower success rates. To address the limitations of the current symbolic (slow speed) or LLM-based approaches (low accuracy), we propose a novel neuro-symbolic task planner that decomposes complex tasks into subgoals using LLM and carries out task planning for each subgoal using either symbolic or MCTS-based LLM planners, depending on the subgoal complexity. This decomposition reduces planning time and improves success rates by narrowing the search space and enabling LLMs to focus on more manageable tasks. Our method significantly reduces planning time while maintaining high success rates across task planning domains, as well as real-world and simulated robotics environments. More details are available at http://graphics.ewha.ac.kr/LLMTAMP/.
Paper Structure (25 sections, 2 equations, 6 figures)

This paper contains 25 sections, 2 equations, 6 figures.

Figures (6)

  • Figure 1: Neuro-symbolic task planning pipeline. LLM (the green blocks) and symbolic languages (the orange blocks) are used for various steps in the pipeline.
  • Figure 2: An overview of the MCTS LLM Planner. First, the L-Policy samples $n_s$ plans for a sub-problem $P_i$. For instance, the initial state $s_i$ of $P_i$ is (on b1 b2)(on-table b2 t1), etc., and the goal state $S^\star_{i+1}$ should satisfy (clear b1)(clear b2)(clear-table t1). A state tree $T_i$ is then generated, and our MCTS algorithm uses $T_i$ to search for a plan that reaches $S^\star_{i+1}$.
  • Figure 3: Success rates (top row) and planning time (bottom row) of CoT, FD, Symbolic LLM, MCTS LLM planners with $3\le n_s \le 5$, and MCTS LLM planner without goal decomposition with $n_s=5$. The $x$ axis in all the graphs denotes the domain complexity $n$.
  • Figure 4: Physical robotic demonstration of our planner on Blocksworld-new domain. Initially, ten blocks, labeled from 1 to 10, are divided into three stacks and placed on the table (leftmost image). The goal is to restack the blocks at the same position in the following order: 10 on 7, 7 on 9, 9 on 8, 1 on 3, 3 on 2, 6 on 5, and 5 on 4 (rightmost image).
  • Figure 5: Simulated robotic demonstration of our planner on Barman-new domain. Initially, three ingredients, three shots, and a shaker are placed on the table (leftmost image). The goal is to make a cocktail and pour it into a shot (rightmost image).
  • ...and 1 more figures