Table of Contents
Fetching ...

Demand Acceptance using Reinforcement Learning for Dynamic Vehicle Routing Problem with Emission Quota

Farid Najar, Dominique Barth, Yann Strozecki

Abstract

This paper introduces and formalizes the Dynamic and Stochastic Vehicle Routing Problem with Emission Quota (DS-QVRP-RR), a novel routing problems that integrates dynamic demand acceptance and routing with a global emission constraint. A key contribution is a two-layer optimization framework designed to facilitate anticipatory rejections of demands and generation of new routes. To solve this, we develop hybrid algorithms that combine reinforcement learning with combinatorial optimization techniques. We present a comprehensive computational study that compares our approach against traditional methods. Our findings demonstrate the relevance of our approach for different types of inputs, even when the horizon of the problem is uncertain.

Demand Acceptance using Reinforcement Learning for Dynamic Vehicle Routing Problem with Emission Quota

Abstract

This paper introduces and formalizes the Dynamic and Stochastic Vehicle Routing Problem with Emission Quota (DS-QVRP-RR), a novel routing problems that integrates dynamic demand acceptance and routing with a global emission constraint. A key contribution is a two-layer optimization framework designed to facilitate anticipatory rejections of demands and generation of new routes. To solve this, we develop hybrid algorithms that combine reinforcement learning with combinatorial optimization techniques. We present a comprehensive computational study that compares our approach against traditional methods. Our findings demonstrate the relevance of our approach for different types of inputs, even when the horizon of the problem is uncertain.
Paper Structure (30 sections, 3 equations, 12 figures, 2 tables)

This paper contains 30 sections, 3 equations, 12 figures, 2 tables.

Figures (12)

  • Figure 1: Destinations of the realistic instance (top), clustered instance (middle), and uniform instance (bottom).
  • Figure 2: Learning curves on the three datasets with DoD = 1. The shaded areas indicate 95% confidence intervals. The horizontal lines mark the mean performance $\mathop{\mathrm{Ad}}\nolimits$ of the baseline methods.
  • Figure 3: Distribution of performance improvements over FAFS on the realistic instance scenarios (means shown as points).
  • Figure 4: Distribution of performance improvements over FAFS on the clustered instance scenarios (means shown as points).
  • Figure 5: Distribution of performance improvements over FAFS on the uniform instance scenarios (means shown as points).
  • ...and 7 more figures