Table of Contents
Fetching ...

Language Models can Infer Action Semantics for Symbolic Planners from Environment Feedback

Wang Zhu, Ishika Singh, Robin Jia, Jesse Thomason

TL;DR

Predicting Semantics of Actions with Language Models (PSALM), which automatically learns action semantics by leveraging the strengths of both symbolic planners and LLMs, and explores the environment more efficiently than prior work to infer ground truth domain action semantics.

Abstract

Symbolic planners can discover a sequence of actions from initial to goal states given expert-defined, domain-specific logical action semantics. Large Language Models (LLMs) can directly generate such sequences, but limitations in reasoning and state-tracking often result in plans that are insufficient or unexecutable. We propose Predicting Semantics of Actions with Language Models (PSALM), which automatically learns action semantics by leveraging the strengths of both symbolic planners and LLMs. PSALM repeatedly proposes and executes plans, using the LLM to partially generate plans and to infer domain-specific action semantics based on execution outcomes. PSALM maintains a belief over possible action semantics that is iteratively updated until a goal state is reached. Experiments on 7 environments show that when learning just from one goal, PSALM boosts plan success rate from 36.4% (on Claude-3.5) to 100%, and explores the environment more efficiently than prior work to infer ground truth domain action semantics.

Language Models can Infer Action Semantics for Symbolic Planners from Environment Feedback

TL;DR

Predicting Semantics of Actions with Language Models (PSALM), which automatically learns action semantics by leveraging the strengths of both symbolic planners and LLMs, and explores the environment more efficiently than prior work to infer ground truth domain action semantics.

Abstract

Symbolic planners can discover a sequence of actions from initial to goal states given expert-defined, domain-specific logical action semantics. Large Language Models (LLMs) can directly generate such sequences, but limitations in reasoning and state-tracking often result in plans that are insufficient or unexecutable. We propose Predicting Semantics of Actions with Language Models (PSALM), which automatically learns action semantics by leveraging the strengths of both symbolic planners and LLMs. PSALM repeatedly proposes and executes plans, using the LLM to partially generate plans and to infer domain-specific action semantics based on execution outcomes. PSALM maintains a belief over possible action semantics that is iteratively updated until a goal state is reached. Experiments on 7 environments show that when learning just from one goal, PSALM boosts plan success rate from 36.4% (on Claude-3.5) to 100%, and explores the environment more efficiently than prior work to infer ground truth domain action semantics.
Paper Structure (41 sections, 1 equation, 19 figures, 4 tables)

This paper contains 41 sections, 1 equation, 19 figures, 4 tables.

Figures (19)

  • Figure 1: LLMs can propose plans and generate action semantics, but struggle with state tracking. Symbolic planners leverage specialized search algorithms, but require predefined action semantics for the environment. PSALM integrates the strengths of both.
  • Figure 2: An example of symbolic planning information from the BlocksW domain, from left to right: PDDL domain file, PDDL problem file, visualization of initial and goal state for block stacking, and a potential plan.
  • Figure 3: The pipeline of PSALM in four steps: (1) sample trajectories from a trajectory sampler; (2) execute the trajectories in the environment to get feedbacks (3) generate action semantics for each action with environment feedback, and update the memory based on the prediction; (4) sample action semantics from the memory to construct the domain file for the symbolic solver to check the success.
  • Figure 4: We compare PSALM with multiple variations over 7 domains. We report on NES and the results suggest (1) LLM as a trajectory sampler greatly reduces the execution steps; (2) LLM and rule-based action semantics generators have complementary benefits; and (3) Prospection to reject trajectories based on current action semantics hypotheses is helpful overall. TS is short for trajectory sampler and ASG is short for action semantics generator.
  • Figure 5: Additional analysis for PSALM. (Left) We vary the type of LLM and show that PSALM works with GPT-3.5 and Mistral-7B on the Termes domain. (Middle) Using the LLM prior before trajectory sampling (darker bars) enables the random baselines to work better compared to not having the prior (lighter bars), though it can adversely affect the full PSALM method. (Right) Experiments where we remove the error message from input to the LLM action semantics generator. Without error messages, PSALM works only on easy domains. For the experiments that fail to find a solution of the problem, we show the action semantics accuracy on top of the bar.
  • ...and 14 more figures