TIC: Translate-Infer-Compile for accurate "text to plan" using LLMs and Logical Representations
Sudhir Agarwal, Anu Sreepathy
TL;DR
The paper addresses the challenge of generating accurate task PDDLs from natural language planning requests by marrying LLM capabilities with symbolic reasoning and classical planning. The Translate-Infer-Compile (TIC) pipeline confines the LLM to producing a logically interpretable intermediate representation, augments it with domain-knowledge inferences via an ASP solver, and then deterministically compiles to PDDL for planning. Empirical results across seven planning domains show TIC achieves high accuracy, including scenarios with language variation, outperforming end-to-end LLM+P approaches and demonstrating strong generalization with generic prompts. The approach is extensible to other structured-task problems, suggesting broad applicability to tool use, API calls, and database queries, with future directions toward alternative logical formalisms like Description Logics/OWL.
Abstract
We study the problem of generating plans for given natural language planning task requests. On one hand, LLMs excel at natural language processing but do not perform well on planning. On the other hand, classical planning tools excel at planning tasks but require input in a structured language such as the Planning Domain Definition Language (PDDL). We leverage the strengths of both the techniques by using an LLM for generating the PDDL representation (task PDDL) of planning task requests followed by using a classical planner for computing a plan. Unlike previous approaches that use LLMs for generating task PDDLs directly, our approach comprises of (a) translate: using an LLM only for generating a logically interpretable intermediate representation of natural language task description, (b) infer: deriving additional logically dependent information from the intermediate representation using a logic reasoner (currently, Answer Set Programming solver), and (c) compile: generating the target task PDDL from the base and inferred information. We observe that using an LLM to only output the intermediate representation significantly reduces LLM errors. Consequently, TIC approach achieves, for at least one LLM, high accuracy on task PDDL generation for all seven domains of our evaluation dataset.
