Table of Contents
Fetching ...

Hierarchical Optimization via LLM-Guided Objective Evolution for Mobility-on-Demand Systems

Yi Zhang, Yushen Long, Yun Ni, Liping Huang, Xiaohong Wang, Jun Liu

TL;DR

The paper addresses the data-efficiency and constraint-enforcement challenges of RL in mobility-on-demand by introducing a training-free hybrid framework where an LLM acts as a meta-optimizer to evolve high-level objectives that guide a low-level, constraint-conscious routing and dispatch optimizer. A harmony search–driven prompt evolution loop enables closed-loop refinement of semantic objectives using solver feedback, bridging high-level reasoning with low-level dynamics. Across NYC and Chicago taxi datasets, the approach achieves a mean improvement of about $16\%$ in passenger waiting time over state-of-the-art baselines, with pronounced gains in large-scale, imbalanced scenarios, demonstrating robustness and practicality. The work highlights a scalable, data-efficient path for dynamic decision-making that integrates semantic reasoning with rigorous optimization, potentially informing real-time, constraint-aware ride-hailing systems.

Abstract

Online ride-hailing platforms aim to deliver efficient mobility-on-demand services, often facing challenges in balancing dynamic and spatially heterogeneous supply and demand. Existing methods typically fall into two categories: reinforcement learning (RL) approaches, which suffer from data inefficiency, oversimplified modeling of real-world dynamics, and difficulty enforcing operational constraints; or decomposed online optimization methods, which rely on manually designed high-level objectives that lack awareness of low-level routing dynamics. To address this issue, we propose a novel hybrid framework that integrates large language model (LLM) with mathematical optimization in a dynamic hierarchical system: (1) it is training-free, removing the need for large-scale interaction data as in RL, and (2) it leverages LLM to bridge cognitive limitations caused by problem decomposition by adaptively generating high-level objectives. Within this framework, LLM serves as a meta-optimizer, producing semantic heuristics that guide a low-level optimizer responsible for constraint enforcement and real-time decision execution. These heuristics are refined through a closed-loop evolutionary process, driven by harmony search, which iteratively adapts the LLM prompts based on feasibility and performance feedback from the optimization layer. Extensive experiments based on scenarios derived from both the New York and Chicago taxi datasets demonstrate the effectiveness of our approach, achieving an average improvement of 16% compared to state-of-the-art baselines.

Hierarchical Optimization via LLM-Guided Objective Evolution for Mobility-on-Demand Systems

TL;DR

The paper addresses the data-efficiency and constraint-enforcement challenges of RL in mobility-on-demand by introducing a training-free hybrid framework where an LLM acts as a meta-optimizer to evolve high-level objectives that guide a low-level, constraint-conscious routing and dispatch optimizer. A harmony search–driven prompt evolution loop enables closed-loop refinement of semantic objectives using solver feedback, bridging high-level reasoning with low-level dynamics. Across NYC and Chicago taxi datasets, the approach achieves a mean improvement of about in passenger waiting time over state-of-the-art baselines, with pronounced gains in large-scale, imbalanced scenarios, demonstrating robustness and practicality. The work highlights a scalable, data-efficient path for dynamic decision-making that integrates semantic reasoning with rigorous optimization, potentially informing real-time, constraint-aware ride-hailing systems.

Abstract

Online ride-hailing platforms aim to deliver efficient mobility-on-demand services, often facing challenges in balancing dynamic and spatially heterogeneous supply and demand. Existing methods typically fall into two categories: reinforcement learning (RL) approaches, which suffer from data inefficiency, oversimplified modeling of real-world dynamics, and difficulty enforcing operational constraints; or decomposed online optimization methods, which rely on manually designed high-level objectives that lack awareness of low-level routing dynamics. To address this issue, we propose a novel hybrid framework that integrates large language model (LLM) with mathematical optimization in a dynamic hierarchical system: (1) it is training-free, removing the need for large-scale interaction data as in RL, and (2) it leverages LLM to bridge cognitive limitations caused by problem decomposition by adaptively generating high-level objectives. Within this framework, LLM serves as a meta-optimizer, producing semantic heuristics that guide a low-level optimizer responsible for constraint enforcement and real-time decision execution. These heuristics are refined through a closed-loop evolutionary process, driven by harmony search, which iteratively adapts the LLM prompts based on feasibility and performance feedback from the optimization layer. Extensive experiments based on scenarios derived from both the New York and Chicago taxi datasets demonstrate the effectiveness of our approach, achieving an average improvement of 16% compared to state-of-the-art baselines.

Paper Structure

This paper contains 69 sections, 3 theorems, 20 equations, 21 figures, 7 tables, 4 algorithms.

Key Result

Proposition 1

Replacing logic constraints (cons:time1) with Inequalities (eq:time1) in the model leads to the same solution.

Figures (21)

  • Figure 1: Overall control flow framework. Search block: Uses harmony search algorithm to iteratively select and apply 3 prompt-refinement operators. Initial iterations prioritize heuristics from Operator 1; Dynamic System block: At each timestep, the LLM generates high-level objectives based on the refined prompt and simulator-reported states (e.g., driver locations, pending orders). These objectives guide a two-level optimizer: a high-level dispatcher assigns orders, and a low-level router determines feasible visiting sequences. The optimizer’s decisions are executed in the simulator, updating system states. This closed-loop process continues until the simulation horizon concludes; Evaluation block: Computes the fitness score from simulator trajectories and pairs it with the LLM-inferred objectives to update the harmony search population.
  • Figure 2: Inference strategies in a single simulation run: (1) Open-loop control, where a one-time query occurs at the beginning of the test. (2) Closed-loop control, where queries occur at each step of the test.
  • Figure 3: Evolutionary mechanism of harmony search algorithm
  • Figure 4: Run 1
  • Figure 5: Run 2
  • ...and 16 more figures

Theorems & Definitions (3)

  • Proposition 1
  • Proposition 2
  • Proposition 3