Table of Contents
Fetching ...

Learning to Search for Vehicle Routing with Multiple Time Windows

Kuan Xu, Zhiguang Cao, Chenlong Zheng, Linong Liu

TL;DR

The paper tackles the Vehicle Routing Problem with Multiple Time Windows (VRPMTW) by introducing RL-AVNS, a reinforcement learning-guided adaptive variable neighborhood search. It leverages a transformer-based policy to dynamically select shaking and local-search operators guided by a time-window fitness metric, aiming to better satisfy complex constraints while minimizing travel cost. Across VRPTW and VRPMTW benchmarks, RL-AVNS outperforms traditional VNS, AVNS, and several neural baselines, and demonstrates notable generalization to unseen instance configurations. This work provides a practical, learning-enabled integration of RL with metaheuristics for complex logistics routing, highlighting improved solution quality and robust adaptability in realistic replenishment-like scenarios.

Abstract

In this study, we propose a reinforcement learning-based adaptive variable neighborhood search (RL-AVNS) method designed for effectively solving the Vehicle Routing Problem with Multiple Time Windows (VRPMTW). Unlike traditional adaptive approaches that rely solely on historical operator performance, our method integrates a reinforcement learning framework to dynamically select neighborhood operators based on real-time solution states and learned experience. We introduce a fitness metric that quantifies customers' temporal flexibility to improve the shaking phase, and employ a transformer-based neural policy network to intelligently guide operator selection during the local search. Extensive computational experiments are conducted on realistic scenarios derived from the replenishment of unmanned vending machines, characterized by multiple clustered replenishment windows. Results demonstrate that RL-AVNS significantly outperforms traditional variable neighborhood search (VNS), adaptive VNS (AVNS), and state-of-the-art learning-based heuristics, achieving substantial improvements in solution quality and computational efficiency across various instance scales and time window complexities. Particularly notable is the algorithm's capability to generalize effectively to problem instances not encountered during training, underscoring its practical utility for complex logistics scenarios.

Learning to Search for Vehicle Routing with Multiple Time Windows

TL;DR

The paper tackles the Vehicle Routing Problem with Multiple Time Windows (VRPMTW) by introducing RL-AVNS, a reinforcement learning-guided adaptive variable neighborhood search. It leverages a transformer-based policy to dynamically select shaking and local-search operators guided by a time-window fitness metric, aiming to better satisfy complex constraints while minimizing travel cost. Across VRPTW and VRPMTW benchmarks, RL-AVNS outperforms traditional VNS, AVNS, and several neural baselines, and demonstrates notable generalization to unseen instance configurations. This work provides a practical, learning-enabled integration of RL with metaheuristics for complex logistics routing, highlighting improved solution quality and robust adaptability in realistic replenishment-like scenarios.

Abstract

In this study, we propose a reinforcement learning-based adaptive variable neighborhood search (RL-AVNS) method designed for effectively solving the Vehicle Routing Problem with Multiple Time Windows (VRPMTW). Unlike traditional adaptive approaches that rely solely on historical operator performance, our method integrates a reinforcement learning framework to dynamically select neighborhood operators based on real-time solution states and learned experience. We introduce a fitness metric that quantifies customers' temporal flexibility to improve the shaking phase, and employ a transformer-based neural policy network to intelligently guide operator selection during the local search. Extensive computational experiments are conducted on realistic scenarios derived from the replenishment of unmanned vending machines, characterized by multiple clustered replenishment windows. Results demonstrate that RL-AVNS significantly outperforms traditional variable neighborhood search (VNS), adaptive VNS (AVNS), and state-of-the-art learning-based heuristics, achieving substantial improvements in solution quality and computational efficiency across various instance scales and time window complexities. Particularly notable is the algorithm's capability to generalize effectively to problem instances not encountered during training, underscoring its practical utility for complex logistics scenarios.

Paper Structure

This paper contains 28 sections, 2 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: An illustration of VRPMTW, where each customer features two optional time windows.
  • Figure 2: Illustration of fitness calculation for node i. The location of the arrow indicates the time when the vehicle arrived at the node $i$, and the length covered by the brace is the value of fitness.
  • Figure 3: The architechture of our policy network in RL-AVNS.
  • Figure 4: Iteration process. After applying the shaking operator, it is usually possible to escape local optima and improve the solution through subsequent local search, leading to global optimal solution.
  • Figure 5: The effect of fitness metric on the final solution.