Table of Contents
Fetching ...

Nl2Hltl2Plan: Scaling Up Natural Language Understanding for Multi-Robots Through Hierarchical Temporal Logic Task Representation

Shaojun Xu, Xusheng Luo, Yutong Huang, Letian Leng, Ruixuan Liu, Changliu Liu

TL;DR

This work tackles the problem of instructing multiple robots with long-horizon tasks expressed in natural language. It introduces Nl2Hltl2Plan, a two-stage neuro-symbolic pipeline that first extracts a Hierarchical Task Tree (HTT) from language and then translates subtasks into hierarchical HLTL (a hierarchical, syntactically co-safe LTL) suitable for off-the-shelf planners. By grounding language in hierarchical temporal logic and using HTTs to maintain task structure, the approach achieves higher success rates and lower planning costs in both simulated AI2-THOR scenarios and real-world tabletop setups, including multi-robot handovers. The results demonstrate the viability and practicality of using HLTL as an interpretable, scalable intermediate representation for multi-robot planning driven by natural language, with clear pathways for improving robustness via syntax/semantic checking and HTT reorganization. This work advances the integration of LLMs with formal planning for multi-robot systems, enabling more complex, user-friendly, and efficient task execution in real-world environments.

Abstract

To enable non-experts to specify long-horizon, multi-robot collaborative tasks, language models are increasingly used to translate natural language commands into formal specifications. However, because translation can occur in multiple ways, such translations may lack accuracy or lead to inefficient multi-robot planning. Our key insight is that concise hierarchical specifications can simplify planning while remaining straightforward to derive from human instructions. We propose Nl2Hltl2Plan, a framework that translates natural language commands into hierarchical Linear Temporal Logic (LTL) and solves the corresponding planning problem. The translation involves two steps leveraging Large Language Models (LLMs). First, an LLM transforms instructions into a Hierarchical Task Tree, capturing logical and temporal relations. Next, a fine-tuned LLM converts sub-tasks into flat LTL formulas, which are aggregated into hierarchical specifications, with the lowest level corresponding to ordered robot actions. These specifications are then used with off-the-shelf planners. Our Nl2Hltl2Plan demonstrates the potential of LLMs in hierarchical reasoning for multi-robot task planning. Evaluations in simulation and real-world experiments with human participants show that Nl2Hltl2Plan outperforms existing methods, handling more complex instructions while achieving higher success rates and lower costs in task allocation and planning. Additional details are available at https://nl2hltl2plan.github.io .

Nl2Hltl2Plan: Scaling Up Natural Language Understanding for Multi-Robots Through Hierarchical Temporal Logic Task Representation

TL;DR

This work tackles the problem of instructing multiple robots with long-horizon tasks expressed in natural language. It introduces Nl2Hltl2Plan, a two-stage neuro-symbolic pipeline that first extracts a Hierarchical Task Tree (HTT) from language and then translates subtasks into hierarchical HLTL (a hierarchical, syntactically co-safe LTL) suitable for off-the-shelf planners. By grounding language in hierarchical temporal logic and using HTTs to maintain task structure, the approach achieves higher success rates and lower planning costs in both simulated AI2-THOR scenarios and real-world tabletop setups, including multi-robot handovers. The results demonstrate the viability and practicality of using HLTL as an interpretable, scalable intermediate representation for multi-robot planning driven by natural language, with clear pathways for improving robustness via syntax/semantic checking and HTT reorganization. This work advances the integration of LLMs with formal planning for multi-robot systems, enabling more complex, user-friendly, and efficient task execution in real-world environments.

Abstract

To enable non-experts to specify long-horizon, multi-robot collaborative tasks, language models are increasingly used to translate natural language commands into formal specifications. However, because translation can occur in multiple ways, such translations may lack accuracy or lead to inefficient multi-robot planning. Our key insight is that concise hierarchical specifications can simplify planning while remaining straightforward to derive from human instructions. We propose Nl2Hltl2Plan, a framework that translates natural language commands into hierarchical Linear Temporal Logic (LTL) and solves the corresponding planning problem. The translation involves two steps leveraging Large Language Models (LLMs). First, an LLM transforms instructions into a Hierarchical Task Tree, capturing logical and temporal relations. Next, a fine-tuned LLM converts sub-tasks into flat LTL formulas, which are aggregated into hierarchical specifications, with the lowest level corresponding to ordered robot actions. These specifications are then used with off-the-shelf planners. Our Nl2Hltl2Plan demonstrates the potential of LLMs in hierarchical reasoning for multi-robot task planning. Evaluations in simulation and real-world experiments with human participants show that Nl2Hltl2Plan outperforms existing methods, handling more complex instructions while achieving higher success rates and lower costs in task allocation and planning. Additional details are available at https://nl2hltl2plan.github.io .
Paper Structure (24 sections, 2 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 24 sections, 2 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: A sequence of images, arranged from left to right and top to bottom, depicts the task "First, put a set of keychains on the armchair. Retrieve a pencil and put it on the side table. Move the phone and the bat to the bed in any order", objects and their trajectories are marked with different colors as follows, keychains (red), bat (blue), pencil (purple) and phone (green). $t$ represents the discrete time steps in simulation.
  • Figure 2: Overview of the framework Nl2Hltl2Plan. The non-leaf nodes in the Hierarchical Task Tree (see Section \ref{['sec:htt']}), the language descriptions of subtasks, and the flat specifications are color-coded to indicate one-to-one correspondence. Summary snippets of the prompts are provided, with more information accessible on the project page https://nl2hltl2plan.github.io.
  • Figure 3: Comparison of pipelines from natural language to plans between Smart-Llm and Nl2Hltl2Plan.
  • Figure 4: Comparative snapshots between Nl2Hltl2Plan and an LLM for task 6. Nl2Hltl2Plan generates an optimal trajectory, whereas the LLM follows the sequence in which the fruits are mentioned in the instructions.
  • Figure 5: Four robot arms in straight line or square configurations, where symbols $E, F$ and $G$ represent source locations and $H, I$ and $J$ denote target locations.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Definition 3.1: Hierarchical sc-LTL luo2024simultaneous
  • Example 1: Dishwasher Loading Problem
  • Definition 4.1: Hierarchical Task Tree (HTT)
  • Remark 4.2