Table of Contents
Fetching ...

End-to-end Deep Reinforcement Learning for Stochastic Multi-objective Optimization in C-VRPTW

Abdo Abouelrous, Laurens Bliek, Yaoxin Wu, Yingqian Zhang

TL;DR

The paper introduces an end-to-end deep reinforcement learning approach for stochastic, multi-objective vehicle routing with time windows (C-VRPTW). It combines a POMO-based MO component with Efficient Active Search and a scenario clustering strategy to handle travel-time uncertainty while constructing a Pareto front across objectives. Empirical results show competitive Pareto-front quality and substantially reduced training runtime compared to baselines, with insights from ablation studies on Monte Carlo evaluation and travel-time variability. The work demonstrates the feasibility and benefits of end-to-end MO optimization under stochasticity and offers a framework that can be extended to other routing problems.

Abstract

In this work, we consider learning-based applications in routing to solve a Vehicle Routing variant characterized by stochasticity and multiple objectives. Such problems are representative of practical settings where decision-makers have to deal with uncertainty in the operational environment as well as multiple conflicting objectives due to different stakeholders. We specifically consider travel time uncertainty. We also consider two objectives, total travel time and route makespan, that jointly target operational efficiency and labor regulations on shift length, although different objectives could be incorporated. Learning-based methods offer earnest computational advantages as they can repeatedly solve problems with limited interference from the decision-maker. We specifically focus on end-to-end deep learning models that leverage the attention mechanism and multiple solution trajectories. These models have seen several successful applications in routing problems. However, since travel times are not a direct input to these models due to the large dimensions of the travel time matrix, accounting for uncertainty is a challenge, especially in the presence of multiple objectives. In turn, we propose a model that simultaneously addresses stochasticity and multi-objectivity and provide a refined training mechanism for this model through scenario clustering to reduce training time. Our results show that our model is capable of constructing a Pareto Front of good quality within acceptable run times compared to three baselines.

End-to-end Deep Reinforcement Learning for Stochastic Multi-objective Optimization in C-VRPTW

TL;DR

The paper introduces an end-to-end deep reinforcement learning approach for stochastic, multi-objective vehicle routing with time windows (C-VRPTW). It combines a POMO-based MO component with Efficient Active Search and a scenario clustering strategy to handle travel-time uncertainty while constructing a Pareto front across objectives. Empirical results show competitive Pareto-front quality and substantially reduced training runtime compared to baselines, with insights from ablation studies on Monte Carlo evaluation and travel-time variability. The work demonstrates the feasibility and benefits of end-to-end MO optimization under stochasticity and offers a framework that can be extended to other routing problems.

Abstract

In this work, we consider learning-based applications in routing to solve a Vehicle Routing variant characterized by stochasticity and multiple objectives. Such problems are representative of practical settings where decision-makers have to deal with uncertainty in the operational environment as well as multiple conflicting objectives due to different stakeholders. We specifically consider travel time uncertainty. We also consider two objectives, total travel time and route makespan, that jointly target operational efficiency and labor regulations on shift length, although different objectives could be incorporated. Learning-based methods offer earnest computational advantages as they can repeatedly solve problems with limited interference from the decision-maker. We specifically focus on end-to-end deep learning models that leverage the attention mechanism and multiple solution trajectories. These models have seen several successful applications in routing problems. However, since travel times are not a direct input to these models due to the large dimensions of the travel time matrix, accounting for uncertainty is a challenge, especially in the presence of multiple objectives. In turn, we propose a model that simultaneously addresses stochasticity and multi-objectivity and provide a refined training mechanism for this model through scenario clustering to reduce training time. Our results show that our model is capable of constructing a Pareto Front of good quality within acceptable run times compared to three baselines.

Paper Structure

This paper contains 16 sections, 7 equations, 5 figures, 6 tables, 2 algorithms.

Figures (5)

  • Figure 1: Visual illustration of our complete approach using pre-trained POMO model.
  • Figure 2: Overview of solution generation with the model of lin2022pareto.
  • Figure 3: Histogram of $Z$ values of EAS-cluster against EAS-basic.
  • Figure 4: Histogram of $Z$ values of EAS-cluster against NoEAS
  • Figure 5: Plots comparing the Pareto Front generated by EAS-cluster against other methods for one of the instances of size 200.