Table of Contents
Fetching ...

Generating Executable Action Plans with Environmentally-Aware Language Models

Maitrey Gramopadhye, Daniel Szafir

TL;DR

This work tackles the mismatch between LLM-generated action plans and real-world executability by grounding plans in the agent's environment. It introduces a two-LLM system that uses environment graphs and object relations to prompt plan generation, coupled with a multi-score ranking mechanism that maps steps to admissible actions and environment objects without fine-tuning. Empirical results on VirtualHome show substantial improvements in executability and final task correctness over a strong baseline, with ablations confirming the value of environment conditioning, object disambiguation, and model sizing. The approach offers a scalable path to more reliable robot action planning in dynamic environments and points toward real-world deployment in future work.

Abstract

Large Language Models (LLMs) trained using massive text datasets have recently shown promise in generating action plans for robotic agents from high level text queries. However, these models typically do not consider the robot's environment, resulting in generated plans that may not actually be executable, due to ambiguities in the planned actions or environmental constraints. In this paper, we propose an approach to generate environmentally-aware action plans that agents are better able to execute. Our approach involves integrating environmental objects and object relations as additional inputs into LLM action plan generation to provide the system with an awareness of its surroundings, resulting in plans where each generated action is mapped to objects present in the scene. We also design a novel scoring function that, along with generating the action steps and associating them with objects, helps the system disambiguate among object instances and take into account their states. We evaluated our approach using the VirtualHome simulator and the ActivityPrograms knowledge base and found that action plans generated from our system had a 310% improvement in executability and a 147% improvement in correctness over prior work. The complete code and a demo of our method is publicly available at https://github.com/hri-ironlab/scene_aware_language_planner.

Generating Executable Action Plans with Environmentally-Aware Language Models

TL;DR

This work tackles the mismatch between LLM-generated action plans and real-world executability by grounding plans in the agent's environment. It introduces a two-LLM system that uses environment graphs and object relations to prompt plan generation, coupled with a multi-score ranking mechanism that maps steps to admissible actions and environment objects without fine-tuning. Empirical results on VirtualHome show substantial improvements in executability and final task correctness over a strong baseline, with ablations confirming the value of environment conditioning, object disambiguation, and model sizing. The approach offers a scalable path to more reliable robot action planning in dynamic environments and points toward real-world deployment in future work.

Abstract

Large Language Models (LLMs) trained using massive text datasets have recently shown promise in generating action plans for robotic agents from high level text queries. However, these models typically do not consider the robot's environment, resulting in generated plans that may not actually be executable, due to ambiguities in the planned actions or environmental constraints. In this paper, we propose an approach to generate environmentally-aware action plans that agents are better able to execute. Our approach involves integrating environmental objects and object relations as additional inputs into LLM action plan generation to provide the system with an awareness of its surroundings, resulting in plans where each generated action is mapped to objects present in the scene. We also design a novel scoring function that, along with generating the action steps and associating them with objects, helps the system disambiguate among object instances and take into account their states. We evaluated our approach using the VirtualHome simulator and the ActivityPrograms knowledge base and found that action plans generated from our system had a 310% improvement in executability and a 147% improvement in correctness over prior work. The complete code and a demo of our method is publicly available at https://github.com/hri-ironlab/scene_aware_language_planner.
Paper Structure (25 sections, 8 equations, 3 figures, 9 tables, 1 algorithm)

This paper contains 25 sections, 8 equations, 3 figures, 9 tables, 1 algorithm.

Figures (3)

  • Figure 1: Visualization of an example action plan being executed in VirtualHome. Within the virtual home environment a simulated humanoid agent carries out the robot task sequences generated by our environmentally-aware language model.
  • Figure 2: An overview of our approach. We generate action plans by first selecting an example that has a similar task and environment to the query. We use this example to autoregressively prompt the Planning LM to generate an action plan and map the output to admissible actions and objects using the Translation LM.
  • Figure 3: Example plans generated by our system. For each action step, matched environment objects with ids are identified in brackets. Our system can handle plans containing actions with multiple objects (e.g., pillow and bed) and can consider multiple objects of the same name (e.g., curtain).