Table of Contents
Fetching ...

Wide-Horizon Thinking and Simulation-Based Evaluation for Real-World LLM Planning with Multifaceted Constraints

Dongjie Yang, Chengqiang Lu, Qimeng Wang, Xinbei Ma, Yan Gao, Yao Hu, Hai Zhao

TL;DR

This work addresses the challenge of real-world planning under multifaceted constraints by proposing Multiple Aspects of Planning (MAoP), a framework that enables wide-horizon thinking through a strategist that pre-plans across multiple aspects and routes a planning blueprint for a planner to execute over multiple dialogue turns. It couples MAoP with Travel-Sim, an agent-based, causal evaluation framework that simulates realistic travel experiences using live maps and blogs to measure feasibility and personalization via a Travel Plan Similarity Score (TPSS) and a composite PER score. The paper details a three-stage MAoP training pipeline (reward-model, rejection-sampling for the strategist, and RL for the planner), an inference-time architecture (decomposition, routing, aspect-aware thinking), and a distillation path to one-step wide-horizon capability. Empirically, MAoP yields substantial gains over long-horizon baselines and superior scaling with more aspects, while Travel-Sim reveals emergent, adaptive behaviors driven by causal dynamics, offering a more faithful assessment of real-world planning quality. These contributions advance robust, scalable planning with complex constraints and provide a principled, simulation-backed evaluation methodology for real-world LLM planning tasks.

Abstract

Unlike reasoning, which often entails a deep sequence of deductive steps, complex real-world planning is characterized by the need to synthesize a broad spectrum of parallel and potentially conflicting information and constraints. For example, in travel planning scenarios, it requires the integration of diverse real-world information and user preferences. While LLMs show promise, existing methods with long-horizon thinking struggle with handling multifaceted constraints, leading to suboptimal solutions. Motivated by the challenges of real-world travel planning, this paper introduces the Multiple Aspects of Planning (MAoP), empowering LLMs with "wide-horizon thinking" to solve planning problems with multifaceted constraints. Instead of direct planning, MAoP leverages the strategist to conduct pre-planning from various aspects and provide the planning blueprint for planners, enabling strong inference-time scalability by scaling aspects to consider various constraints. In addition, existing benchmarks for multi-constraint planning are flawed because they assess constraints in isolation, ignoring causal dependencies within the constraints, e.g, travel planning, where past activities dictate future itinerary. To address this, we propose Travel-Sim, an agent-based benchmark assessing plans via real-world simulation, thereby inherently resolving these causal dependencies. This paper advances LLM capabilities in complex planning and offers novel insights for evaluating sophisticated scenarios through simulation.

Wide-Horizon Thinking and Simulation-Based Evaluation for Real-World LLM Planning with Multifaceted Constraints

TL;DR

This work addresses the challenge of real-world planning under multifaceted constraints by proposing Multiple Aspects of Planning (MAoP), a framework that enables wide-horizon thinking through a strategist that pre-plans across multiple aspects and routes a planning blueprint for a planner to execute over multiple dialogue turns. It couples MAoP with Travel-Sim, an agent-based, causal evaluation framework that simulates realistic travel experiences using live maps and blogs to measure feasibility and personalization via a Travel Plan Similarity Score (TPSS) and a composite PER score. The paper details a three-stage MAoP training pipeline (reward-model, rejection-sampling for the strategist, and RL for the planner), an inference-time architecture (decomposition, routing, aspect-aware thinking), and a distillation path to one-step wide-horizon capability. Empirically, MAoP yields substantial gains over long-horizon baselines and superior scaling with more aspects, while Travel-Sim reveals emergent, adaptive behaviors driven by causal dynamics, offering a more faithful assessment of real-world planning quality. These contributions advance robust, scalable planning with complex constraints and provide a principled, simulation-backed evaluation methodology for real-world LLM planning tasks.

Abstract

Unlike reasoning, which often entails a deep sequence of deductive steps, complex real-world planning is characterized by the need to synthesize a broad spectrum of parallel and potentially conflicting information and constraints. For example, in travel planning scenarios, it requires the integration of diverse real-world information and user preferences. While LLMs show promise, existing methods with long-horizon thinking struggle with handling multifaceted constraints, leading to suboptimal solutions. Motivated by the challenges of real-world travel planning, this paper introduces the Multiple Aspects of Planning (MAoP), empowering LLMs with "wide-horizon thinking" to solve planning problems with multifaceted constraints. Instead of direct planning, MAoP leverages the strategist to conduct pre-planning from various aspects and provide the planning blueprint for planners, enabling strong inference-time scalability by scaling aspects to consider various constraints. In addition, existing benchmarks for multi-constraint planning are flawed because they assess constraints in isolation, ignoring causal dependencies within the constraints, e.g, travel planning, where past activities dictate future itinerary. To address this, we propose Travel-Sim, an agent-based benchmark assessing plans via real-world simulation, thereby inherently resolving these causal dependencies. This paper advances LLM capabilities in complex planning and offers novel insights for evaluating sophisticated scenarios through simulation.

Paper Structure

This paper contains 104 sections, 3 equations, 9 figures, 10 tables, 3 algorithms.

Figures (9)

  • Figure 1: The comparison between long-horizon and wide-horizon thinking reveals distinct cognitive approaches. While long-horizon thinking involves deep exploration of a single reasoning trajectory, wide-horizon thinking incorporates heterogeneous information and constraints in long contexts by considering various aspects. It necessitates parallel consideration of multiple dimensions, which are subsequently integrated to generate comprehensive outputs.
  • Figure 2: The overview of the MAoP training and inference process.
  • Figure 3: The comprehensive comparison results between MAoP and the baseline methods.
  • Figure 4: We experiment two strategists (Qwen 7B & R1-Distill Qwen-7B) and two planners (Qwen 7B & Gemini 2.5) to showcase the scaling capability of the strategists.
  • Figure 5: The simulation trajectory is not always consistent with the planned one because the traveler agent can change the subsequent itinerary based on the current situation. The FEA score is used to calculate the similarity of these two trajectories.
  • ...and 4 more figures