Table of Contents
Fetching ...

SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models

Yi Wu, Zikang Xiong, Yiran Hu, Shreyash S. Iyengar, Nan Jiang, Aniket Bera, Lin Tan, Suresh Jagannathan

TL;DR

SELP tackles the challenge of safe and efficient long-horizon robotic planning under complex natural-language commands by fusing three techniques: equivalence voting to robustly translate NL to LTL specifications, LTL-enforced constrained decoding to prune unsafe plan actions via a Büchi automaton, and domain-specific fine-tuning to bias planners toward efficient, safe plans. The approach yields two new datasets, DroneNav and TabletopManip, and demonstrates superior safety and speed over state-of-the-art LLM planners across drone navigation and tabletop manipulation tasks, with notable improvements in translation accuracy thanks to voting. Key contributions include a robust NL-to-LTL translation method, a practical constrained decoding mechanism that enforces temporal logic during inference, and a fine-tuning strategy that aligns planning with safety and efficiency goals. The work’s results suggest SELP’s techniques generalize across domains and offer a tangible path toward reliable NL-driven robotic planning in real-world settings; future work will explore energy efficiency and multi-modal perception integration.

Abstract

Despite significant advancements in large language models (LLMs) that enhance robot agents' understanding and execution of natural language (NL) commands, ensuring the agents adhere to user-specified constraints remains challenging, particularly for complex commands and long-horizon tasks. To address this challenge, we present three key insights, equivalence voting, constrained decoding, and domain-specific fine-tuning, which significantly enhance LLM planners' capability in handling complex tasks. Equivalence voting ensures consistency by generating and sampling multiple Linear Temporal Logic (LTL) formulas from NL commands, grouping equivalent LTL formulas, and selecting the majority group of formulas as the final LTL formula. Constrained decoding then uses the generated LTL formula to enforce the autoregressive inference of plans, ensuring the generated plans conform to the LTL. Domain-specific fine-tuning customizes LLMs to produce safe and efficient plans within specific task domains. Our approach, Safe Efficient LLM Planner (SELP), combines these insights to create LLM planners to generate plans adhering to user commands with high confidence. We demonstrate the effectiveness and generalizability of SELP across different robot agents and tasks, including drone navigation and robot manipulation. For drone navigation tasks, SELP outperforms state-of-the-art planners by 10.8% in safety rate (i.e., finishing tasks conforming to NL commands) and by 19.8% in plan efficiency. For robot manipulation tasks, SELP achieves 20.4% improvement in safety rate. Our datasets for evaluating NL-to-LTL and robot task planning will be released in github.com/lt-asset/selp.

SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models

TL;DR

SELP tackles the challenge of safe and efficient long-horizon robotic planning under complex natural-language commands by fusing three techniques: equivalence voting to robustly translate NL to LTL specifications, LTL-enforced constrained decoding to prune unsafe plan actions via a Büchi automaton, and domain-specific fine-tuning to bias planners toward efficient, safe plans. The approach yields two new datasets, DroneNav and TabletopManip, and demonstrates superior safety and speed over state-of-the-art LLM planners across drone navigation and tabletop manipulation tasks, with notable improvements in translation accuracy thanks to voting. Key contributions include a robust NL-to-LTL translation method, a practical constrained decoding mechanism that enforces temporal logic during inference, and a fine-tuning strategy that aligns planning with safety and efficiency goals. The work’s results suggest SELP’s techniques generalize across domains and offer a tangible path toward reliable NL-driven robotic planning in real-world settings; future work will explore energy efficiency and multi-modal perception integration.

Abstract

Despite significant advancements in large language models (LLMs) that enhance robot agents' understanding and execution of natural language (NL) commands, ensuring the agents adhere to user-specified constraints remains challenging, particularly for complex commands and long-horizon tasks. To address this challenge, we present three key insights, equivalence voting, constrained decoding, and domain-specific fine-tuning, which significantly enhance LLM planners' capability in handling complex tasks. Equivalence voting ensures consistency by generating and sampling multiple Linear Temporal Logic (LTL) formulas from NL commands, grouping equivalent LTL formulas, and selecting the majority group of formulas as the final LTL formula. Constrained decoding then uses the generated LTL formula to enforce the autoregressive inference of plans, ensuring the generated plans conform to the LTL. Domain-specific fine-tuning customizes LLMs to produce safe and efficient plans within specific task domains. Our approach, Safe Efficient LLM Planner (SELP), combines these insights to create LLM planners to generate plans adhering to user commands with high confidence. We demonstrate the effectiveness and generalizability of SELP across different robot agents and tasks, including drone navigation and robot manipulation. For drone navigation tasks, SELP outperforms state-of-the-art planners by 10.8% in safety rate (i.e., finishing tasks conforming to NL commands) and by 19.8% in plan efficiency. For robot manipulation tasks, SELP achieves 20.4% improvement in safety rate. Our datasets for evaluating NL-to-LTL and robot task planning will be released in github.com/lt-asset/selp.
Paper Structure (17 sections, 1 equation, 7 figures, 3 tables)

This paper contains 17 sections, 1 equation, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Motivating Example: given a drone navigation task (a), SELP generates a safe and efficient plan (green box in (b)), while GPT-4 generates unsafe plan violating the constraints highlighted in red and blue. (c) shows two safe trajectories generated by SELP from our plan in (b): (1) with domain-specific fine-tuning (green), and (2) without domain-specific fine-tuning (black). The execution times of the green and black trajectories are 359.56s and 551.86s respectively.
  • Figure 2: The High-Level Framework of SELP. NL instructions will be input to (a) an LTL translator to build LTL formulas and (b) a planner to generate the probability distribution of each plan step. SELP enforces consistency of the generated LTL formulas and the plans sampled from the probability distribution by turning the LTL formula into a Büchi automaton, which monitors and masks out invalid plans. Finally, the plan consistent with the LTL formula will be executed in the simulator.
  • Figure 3: LTL Translation: Equivalence Voting
  • Figure 4: Example of how an LTL automaton enforces the inference of the LLM planner. An LLM planner (the purple box) predicts the probability of the next tokens, and a Buchi automaton (the yellow box) checks whether these tokens will result in any invalid states and prevent the planner from sampling the tokens that violate constraints.
  • Figure 5: Tabletop Manipulation
  • ...and 2 more figures