Wide-Horizon Thinking and Simulation-Based Evaluation for Real-World LLM Planning with Multifaceted Constraints
Dongjie Yang, Chengqiang Lu, Qimeng Wang, Xinbei Ma, Yan Gao, Yao Hu, Hai Zhao
TL;DR
This work addresses the challenge of real-world planning under multifaceted constraints by proposing Multiple Aspects of Planning (MAoP), a framework that enables wide-horizon thinking through a strategist that pre-plans across multiple aspects and routes a planning blueprint for a planner to execute over multiple dialogue turns. It couples MAoP with Travel-Sim, an agent-based, causal evaluation framework that simulates realistic travel experiences using live maps and blogs to measure feasibility and personalization via a Travel Plan Similarity Score (TPSS) and a composite PER score. The paper details a three-stage MAoP training pipeline (reward-model, rejection-sampling for the strategist, and RL for the planner), an inference-time architecture (decomposition, routing, aspect-aware thinking), and a distillation path to one-step wide-horizon capability. Empirically, MAoP yields substantial gains over long-horizon baselines and superior scaling with more aspects, while Travel-Sim reveals emergent, adaptive behaviors driven by causal dynamics, offering a more faithful assessment of real-world planning quality. These contributions advance robust, scalable planning with complex constraints and provide a principled, simulation-backed evaluation methodology for real-world LLM planning tasks.
Abstract
Unlike reasoning, which often entails a deep sequence of deductive steps, complex real-world planning is characterized by the need to synthesize a broad spectrum of parallel and potentially conflicting information and constraints. For example, in travel planning scenarios, it requires the integration of diverse real-world information and user preferences. While LLMs show promise, existing methods with long-horizon thinking struggle with handling multifaceted constraints, leading to suboptimal solutions. Motivated by the challenges of real-world travel planning, this paper introduces the Multiple Aspects of Planning (MAoP), empowering LLMs with "wide-horizon thinking" to solve planning problems with multifaceted constraints. Instead of direct planning, MAoP leverages the strategist to conduct pre-planning from various aspects and provide the planning blueprint for planners, enabling strong inference-time scalability by scaling aspects to consider various constraints. In addition, existing benchmarks for multi-constraint planning are flawed because they assess constraints in isolation, ignoring causal dependencies within the constraints, e.g, travel planning, where past activities dictate future itinerary. To address this, we propose Travel-Sim, an agent-based benchmark assessing plans via real-world simulation, thereby inherently resolving these causal dependencies. This paper advances LLM capabilities in complex planning and offers novel insights for evaluating sophisticated scenarios through simulation.
