Improving Planning with Large Language Models: A Modular Agentic Architecture

Taylor Webb; Shanka Subhra Mondal; Ida Momennejad

Improving Planning with Large Language Models: A Modular Agentic Architecture

Taylor Webb, Shanka Subhra Mondal, Ida Momennejad

TL;DR

This paper addresses the challenge of multi-step planning with large language models by proposing MAP, a modular, agentic architecture that decomposes planning into specialized LLM-powered components. MAP enables planning through recurrent interactions among Module-specific prompts (TaskDecomposer, Actor, Monitor, Predictor, Evaluator, Orchestrator) and a tree-search powered reasoning loop. Across graph traversal tasks, Tower of Hanoi, PlanBench, and StrategyQA, MAP outperforms standard LLM methods and competitive baselines, with ablations highlighting the critical roles of monitoring, subgoal decomposition, and search. The work demonstrates the value of modularity and planning orchestration in LLMs, and discusses avenues for reducing cost and extending to smaller models and tool use.

Abstract

Large language models (LLMs) demonstrate impressive performance on a wide variety of tasks, but they often struggle with tasks that require multi-step reasoning or goal-directed planning. Both cognitive neuroscience and reinforcement learning (RL) have proposed a number of interacting functional components that together implement search and evaluation in multi-step decision making. These components include conflict monitoring, state prediction, state evaluation, task decomposition, and orchestration. To improve planning with LLMs, we propose an agentic architecture, the Modular Agentic Planner (MAP), in which planning is accomplished via the recurrent interaction of the specialized modules mentioned above, each implemented using an LLM. MAP improves planning through the interaction of specialized modules that break down a larger problem into multiple brief automated calls to the LLM. We evaluate MAP on three challenging planning tasks -- graph traversal, Tower of Hanoi, and the PlanBench benchmark -- as well as an NLP task requiring multi-step reasoning (strategyQA). We find that MAP yields significant improvements over both standard LLM methods (zero-shot prompting, in-context learning) and competitive baselines (chain-of-thought, multi-agent debate, and tree-of-thought), can be effectively combined with smaller and more cost-efficient LLMs (Llama3-70B), and displays superior transfer across tasks. These results suggest the benefit of a modular and multi-agent approach to planning with LLMs.

Improving Planning with Large Language Models: A Modular Agentic Architecture

TL;DR

Abstract

Paper Structure (31 sections, 6 figures, 16 tables, 1 algorithm)

This paper contains 31 sections, 6 figures, 16 tables, 1 algorithm.

Introduction
Approach
Modules
Action Proposal Loop
Tree Search
Plan Generation
Experiments
Tasks
Baselines
Results
Ablation study
Related work
Conclusion and future directions
Appendix
Supplementary Algorithms
...and 16 more sections

Figures (6)

Figure 1: Modular Agentic Planner (MAP). The agent receives states from the environment and high-level goals. These are processed by a set of specialized LLM modules. The $\operatorname{Task Decomposer}$ receives high-level goals and generates a series of subgoals. The $\operatorname{Actor}$ generates proposed actions given a state and a subgoal. The $\operatorname{Monitor}$ gates these proposed actions based on whether they violate certain constraints (e.g., task rules) and provides feedback to the $\operatorname{Actor}$. The $\operatorname{Predictor}$ predicts the next state given the current state and a proposed action. The $\operatorname{Evaluator}$ is used to estimate the value of a predicted state. The $\operatorname{Predictor}$ and $\operatorname{Evaluator}$ are used together to perform tree search. The $\operatorname{Orchestrator}$ determines when each subgoal has been achieved, and when the final goal has been achieved, at which point the plan is emitted to the environment as a series of actions.
Figure 2: Graph traversal results. '% solved' indicates percentage of problems solved without proposing invalid actions ($\uparrow$ better). GPT-4 Zero-shot, ICL, COT, and MAD baselines are deterministic, and therefore a single run was performed on all problems. Note that MAP did not employ tree search on the Steppath task, and did not employ task decomposition on any of the graph traversal tasks. Without tree search, MAP's performance is deterministic, and therefore only a single run was performed on the Steppath task, whereas we performed 5 runs with ToT. Gray error bars reflect 95% binomial confidence intervals (for models evaluated on a single run). Dots reflect values of 0%. Dark bars indicate average performance over multiple plans/runs. Light bars indicate best performance. For Valuepath, Detour, and Reward Revaluation we performed 10, 10, and 5 runs respectively with MAP and ToT, and present average performance $\pm$ the standard error of the mean (black error bars).
Figure 3: Tower of Hanoi (ToH) results. '% solved' indicates percentage of problems solved without proposing invalid actions ($\uparrow$ better). '% invalid' indicates percentage of moves that are invalid ($\downarrow$ better). Note that 4-disk problems are out-of-distribution (OOD). GPT-4 Zero-shot, ICL, CoT, and MAD baselines are deterministic and reflect a single run. Gray error bars reflect 95% binomial confidence intervals. Dots reflect values of 0%. Dark bars indicate average performance over multiple plans/runs. Light bars indicate best performance. MAP results for 3-disk problems reflect the average over 5 runs $\pm$ the standard error of the mean (black error bars). MAP results for 4-disk problems reflect a single run, due to the high computational cost of multiple runs.
Figure 4: Graph Traversal. We investigated two graph traversal tasks utilizing a challenging graph with community structure. Steppath: Find shortest path between two nodes, e.g. node 3 and node 7. Valuepath: Find shortest path from starting location (e.g., node 10) to location with maximum reward (node 8 in depicted example).
Figure 5: Tower of Hanoi.Top: Depiction of the Tower of Hanoi (ToH) puzzle. Disks are stacked in order of decreasing size on the leftmost peg. The goal is to move these disks so that they are stacked in order of decreasing size on the rightmost peg. Only the disk on the top of the stack may be moved, and a disk can only be placed on top of larger disks (or on an empty peg). The version shown involves three disks, but more disks can be used (making the task significantly more difficult). Bottom: Modified text-based version of ToH. Three lists are presented, labelled A, B and C. A set of integers is distributed amongst these lists. The goal is to move the numbers so that they are arranged in ascending order in list C. Only the number at the end of the list may be moved, and a number can only be placed in front of a smaller number. Multiple problem instances were created by varying the initial state.
...and 1 more figures

Improving Planning with Large Language Models: A Modular Agentic Architecture

TL;DR

Abstract

Improving Planning with Large Language Models: A Modular Agentic Architecture

Authors

TL;DR

Abstract

Table of Contents

Figures (6)