Demand Acceptance using Reinforcement Learning for Dynamic Vehicle Routing Problem with Emission Quota

Farid Najar; Dominique Barth; Yann Strozecki

Demand Acceptance using Reinforcement Learning for Dynamic Vehicle Routing Problem with Emission Quota

Farid Najar, Dominique Barth, Yann Strozecki

Abstract

This paper introduces and formalizes the Dynamic and Stochastic Vehicle Routing Problem with Emission Quota (DS-QVRP-RR), a novel routing problems that integrates dynamic demand acceptance and routing with a global emission constraint. A key contribution is a two-layer optimization framework designed to facilitate anticipatory rejections of demands and generation of new routes. To solve this, we develop hybrid algorithms that combine reinforcement learning with combinatorial optimization techniques. We present a comprehensive computational study that compares our approach against traditional methods. Our findings demonstrate the relevance of our approach for different types of inputs, even when the horizon of the problem is uncertain.

Demand Acceptance using Reinforcement Learning for Dynamic Vehicle Routing Problem with Emission Quota

Abstract

Paper Structure (30 sections, 3 equations, 12 figures, 2 tables)

This paper contains 30 sections, 3 equations, 12 figures, 2 tables.

Introduction
Related Works
Our Contributions
Problem Setting
Algorithmic framework
Two layers decomposition
Offline and online phases
Methods
Routing Method
DQVRP Model
MDP Formulation
Distance to assigned destinations.
Distance to potential future demands.
Observation vector.
Assignment Methods
...and 15 more sections

Figures (12)

Figure 1: Destinations of the realistic instance (top), clustered instance (middle), and uniform instance (bottom).
Figure 2: Learning curves on the three datasets with DoD = 1. The shaded areas indicate 95% confidence intervals. The horizontal lines mark the mean performance $\mathop{\mathrm{Ad}}\nolimits$ of the baseline methods.
Figure 3: Distribution of performance improvements over FAFS on the realistic instance scenarios (means shown as points).
Figure 4: Distribution of performance improvements over FAFS on the clustered instance scenarios (means shown as points).
Figure 5: Distribution of performance improvements over FAFS on the uniform instance scenarios (means shown as points).
...and 7 more figures

Demand Acceptance using Reinforcement Learning for Dynamic Vehicle Routing Problem with Emission Quota

Abstract

Demand Acceptance using Reinforcement Learning for Dynamic Vehicle Routing Problem with Emission Quota

Authors

Abstract

Table of Contents

Figures (12)