Improving Planning with Large Language Models: A Modular Agentic Architecture
Taylor Webb, Shanka Subhra Mondal, Ida Momennejad
TL;DR
This paper addresses the challenge of multi-step planning with large language models by proposing MAP, a modular, agentic architecture that decomposes planning into specialized LLM-powered components. MAP enables planning through recurrent interactions among Module-specific prompts (TaskDecomposer, Actor, Monitor, Predictor, Evaluator, Orchestrator) and a tree-search powered reasoning loop. Across graph traversal tasks, Tower of Hanoi, PlanBench, and StrategyQA, MAP outperforms standard LLM methods and competitive baselines, with ablations highlighting the critical roles of monitoring, subgoal decomposition, and search. The work demonstrates the value of modularity and planning orchestration in LLMs, and discusses avenues for reducing cost and extending to smaller models and tool use.
Abstract
Large language models (LLMs) demonstrate impressive performance on a wide variety of tasks, but they often struggle with tasks that require multi-step reasoning or goal-directed planning. Both cognitive neuroscience and reinforcement learning (RL) have proposed a number of interacting functional components that together implement search and evaluation in multi-step decision making. These components include conflict monitoring, state prediction, state evaluation, task decomposition, and orchestration. To improve planning with LLMs, we propose an agentic architecture, the Modular Agentic Planner (MAP), in which planning is accomplished via the recurrent interaction of the specialized modules mentioned above, each implemented using an LLM. MAP improves planning through the interaction of specialized modules that break down a larger problem into multiple brief automated calls to the LLM. We evaluate MAP on three challenging planning tasks -- graph traversal, Tower of Hanoi, and the PlanBench benchmark -- as well as an NLP task requiring multi-step reasoning (strategyQA). We find that MAP yields significant improvements over both standard LLM methods (zero-shot prompting, in-context learning) and competitive baselines (chain-of-thought, multi-agent debate, and tree-of-thought), can be effectively combined with smaller and more cost-efficient LLMs (Llama3-70B), and displays superior transfer across tasks. These results suggest the benefit of a modular and multi-agent approach to planning with LLMs.
