Knowledge Model Prompting Increases LLM Performance on Planning Tasks
Erik Goh, John Kos, Ashok Goel
TL;DR
The paper investigates whether Task-Method-Knowledge (TMK) structured prompting can enhance LLM planning, specifically in the Blocksworld domain of PlanBench. By replacing the domain knowledge with a JSON-form TMK prompt, the study shows that TMK can shift models away from semantic linguistic priors toward formal, code-like symbolic reasoning, yielding large gains in planning accuracy (e.g., up to $97.3\%$ on Random Blocksworld for some models and a maximum $65.8\%$ improvement on o1). The authors attribute these gains to a cognitive scaffolding effect and a steering mechanism that grounds procedural reasoning in causal teleology, with TMK functioning as an explicit justification for actions. While promising, the results also reveal model- and domain-specific limitations, underscoring the need for broader domain tests and comparisons with other symbolic frameworks. Overall, TMK prompting offers a principled path to improve LLM planning by embedding expert-like, goal-driven structure into prompts, potentially unlocking latent symbolic capabilities in LLMs.
Abstract
Large Language Models (LLM) can struggle with reasoning ability and planning tasks. Many prompting techniques have been developed to assist with LLM reasoning, notably Chain-of-Thought (CoT); however, these techniques, too, have come under scrutiny as LLMs' ability to reason at all has come into question. Borrowing from the domain of cognitive and educational science, this paper investigates whether the Task-Method-Knowledge (TMK) framework can improve LLM reasoning capabilities beyond its previously demonstrated success in educational applications. The TMK framework's unique ability to capture causal, teleological, and hierarchical reasoning structures, combined with its explicit task decomposition mechanisms, makes it particularly well-suited for addressing language model reasoning deficiencies, and unlike other hierarchical frameworks such as HTN and BDI, TMK provides explicit representations of not just what to do and how to do it, but also why actions are taken. The study evaluates TMK by experimenting on the PlanBench benchmark, focusing on the Blocksworld domain to test for reasoning and planning capabilities, examining whether TMK-structured prompting can help language models better decompose complex planning problems into manageable sub-tasks. Results also highlight significant performance inversion in reasoning models. TMK prompting enables the reasoning model to achieve up to an accuracy of 97.3\% on opaque, symbolic tasks (Random versions of Blocksworld in PlanBench) where it previously failed (31.5\%), suggesting the potential to bridge the gap between semantic approximation and symbolic manipulation. Our findings suggest that TMK functions not merely as context, but also as a mechanism that steers reasoning models away from their default linguistic modes to engage formal, code-execution pathways in the context of the experiments.
