Table of Contents
Fetching ...

Cost-Awareness in Tree-Search LLM Planning: A Systematic Study

Zihao Zhang, Hui Wei, Kenan Jiang, Shijia Pan, Shu Kai, Fei Liu

TL;DR

This work addresses planning under heterogeneous action costs in LLM-based planning by systematically evaluating tree-search planners (ToT BFS/DFS, MCTS, Bidirectional Search) under explicit budget constraints. It demonstrates that while tree-search improves feasibility over plain prompting, it does not consistently yield cost-optimal plans on long horizons, and that simply increasing search effort yields diminishing returns. Among the approaches, bidirectional search offers the best efficiency and long-horizon success, whereas MCTS achieves the strongest short-horizon optimality. The findings highlight the need for cost-aware reward guidance and principled pruning rather than relying solely on inference-time compute to advance resource-constrained LLM planning.

Abstract

Planning under resource constraints is central to real-world decision making, yet most large language model (LLM) planners assume uniform action costs. We systematically analyze whether tree-search LLM planners are cost-aware and whether they efficiently generate budget-feasible plans. In contrast to black-box prompting, explicit search trees expose intermediate decisions, node evaluations, and failure modes, which allows for controlled ablations of planner behavior. We study depth-first search, breadth-first search, Monte Carlo Tree Search, and bidirectional search within a unified framework. Our experiments show that existing tree-based LLM planners often struggle to find cost-optimal plans, and that additional search computation does not reliably improve optimality. Among the methods evaluated, bidirectional search achieves the best overall efficiency and success rate. MCTS achieves the highest optimality on short-horizon tasks. Tree-search planners are especially valuable for studying LLM planning because their reasoning steps are explicit, in contrast to plain LLMs that internalize planning dynamics through post-training trajectories. Our findings suggest that improving LLM planning under resource constraints will likely require new search algorithms, rather than solely scaling inference-time compute.

Cost-Awareness in Tree-Search LLM Planning: A Systematic Study

TL;DR

This work addresses planning under heterogeneous action costs in LLM-based planning by systematically evaluating tree-search planners (ToT BFS/DFS, MCTS, Bidirectional Search) under explicit budget constraints. It demonstrates that while tree-search improves feasibility over plain prompting, it does not consistently yield cost-optimal plans on long horizons, and that simply increasing search effort yields diminishing returns. Among the approaches, bidirectional search offers the best efficiency and long-horizon success, whereas MCTS achieves the strongest short-horizon optimality. The findings highlight the need for cost-aware reward guidance and principled pruning rather than relying solely on inference-time compute to advance resource-constrained LLM planning.

Abstract

Planning under resource constraints is central to real-world decision making, yet most large language model (LLM) planners assume uniform action costs. We systematically analyze whether tree-search LLM planners are cost-aware and whether they efficiently generate budget-feasible plans. In contrast to black-box prompting, explicit search trees expose intermediate decisions, node evaluations, and failure modes, which allows for controlled ablations of planner behavior. We study depth-first search, breadth-first search, Monte Carlo Tree Search, and bidirectional search within a unified framework. Our experiments show that existing tree-based LLM planners often struggle to find cost-optimal plans, and that additional search computation does not reliably improve optimality. Among the methods evaluated, bidirectional search achieves the best overall efficiency and success rate. MCTS achieves the highest optimality on short-horizon tasks. Tree-search planners are especially valuable for studying LLM planning because their reasoning steps are explicit, in contrast to plain LLMs that internalize planning dynamics through post-training trajectories. Our findings suggest that improving LLM planning under resource constraints will likely require new search algorithms, rather than solely scaling inference-time compute.

Paper Structure

This paper contains 26 sections, 3 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: A visual comparison of four tree-based LLM planners included in this study, where node color intensity indicates search priority. ToT-BFS combines an LLM with breadth-first search, exhaustively exploring all nodes at each depth. ToT-DFS uses depth-first search, prioritizing deeper expansions during the search process. MCTS implements the Monte Carlo Tree Search (MCTS), combining exploration and previous roll-out by Upper Confidence Bounds for Trees (UCT equation \ref{['eq:ucb']}) to dynamically guide the search process. Bidirectional Search (Bi-Search) incorporates two search trees, one from the initial state, one from the goal state, and alternates expansions to find a connecting state, thereby reducing effective search depth and pruning the search space.
  • Figure 2: Node expansion vs Success rate. This plot demonstrates the number of node usage to find the optimal plan under TIGHT constraint.
  • Figure 3: Decision-level diagnostics of cost-aware planning failures. We show the initial and goal states (top) and the action-cost schedule. Each panel zooms into a search step, listing the partial plan, cost usage versus the optimal cost, and the candidate actions with their model scores. (a) Assign high score to suboptimal action: the model assigns a high score to a costly suboptimal action (selected), diverting the search from the optimal continuation. (b) Force to selected suboptimal action: all candidates receive low, similar scores, forcing selection among suboptimal options and keep searching on the high cost branch.
  • Figure 4: Failure mode distribution of TIGHT condition from $L=2$ to $L=8$