Table of Contents
Fetching ...

Demand Selection for VRP with Emission Quota

Farid Najar, Dominique Barth, Yann Strozecki

TL;DR

This work tackles demand selection under an emission quota in VRP (QVRP) by introducing Maximum Feasible Vehicle Assignment (MFVA), a two-layer problem separating assignment and routing. The routing layer is solved with classical OR techniques, while the assignment layer is explored via greedy, dynamic programming, simulated annealing, and learning-based methods including RL and multi-agent approaches. Across synthetic and real-data experiments, classical OR-based methods, especially metaheuristics, consistently outperform learning-based approaches in static MFVA/QVRP settings, with RL and decentralized learners showing limited generalization or efficiency. The findings suggest that pure learning or end-to-end approaches may be unsuitable for static combinatorial optimization, though hybrid or dynamic variants could still benefit from learning in future work.

Abstract

Combinatorial optimization (CO) problems are traditionally addressed using Operations Research (OR) methods, including metaheuristics. In this study, we introduce a demand selection problem for the Vehicle Routing Problem (VRP) with an emission quota, referred to as QVRP. The objective is to minimize the number of omitted deliveries while respecting the pollution quota. We focus on the demand selection part, called Maximum Feasible Vehicle Assignment (MFVA), while the construction of a routing for the VRP instance is solved using classical OR methods. We propose several methods for selecting the packages to omit, both from machine learning (ML) and OR. Our results show that, in this static problem setting, classical OR-based methods consistently outperform ML-based approaches.

Demand Selection for VRP with Emission Quota

TL;DR

This work tackles demand selection under an emission quota in VRP (QVRP) by introducing Maximum Feasible Vehicle Assignment (MFVA), a two-layer problem separating assignment and routing. The routing layer is solved with classical OR techniques, while the assignment layer is explored via greedy, dynamic programming, simulated annealing, and learning-based methods including RL and multi-agent approaches. Across synthetic and real-data experiments, classical OR-based methods, especially metaheuristics, consistently outperform learning-based approaches in static MFVA/QVRP settings, with RL and decentralized learners showing limited generalization or efficiency. The findings suggest that pure learning or end-to-end approaches may be unsuitable for static combinatorial optimization, though hybrid or dynamic variants could still benefit from learning in future work.

Abstract

Combinatorial optimization (CO) problems are traditionally addressed using Operations Research (OR) methods, including metaheuristics. In this study, we introduce a demand selection problem for the Vehicle Routing Problem (VRP) with an emission quota, referred to as QVRP. The objective is to minimize the number of omitted deliveries while respecting the pollution quota. We focus on the demand selection part, called Maximum Feasible Vehicle Assignment (MFVA), while the construction of a routing for the VRP instance is solved using classical OR methods. We propose several methods for selecting the packages to omit, both from machine learning (ML) and OR. Our results show that, in this static problem setting, classical OR-based methods consistently outperform ML-based approaches.

Paper Structure

This paper contains 31 sections, 2 theorems, 6 equations, 11 figures, 1 table.

Key Result

theorem thmcountertheorem

When restricted to instances where the fleet of vehicles has a fixed number of different coefficients $Ef$ and $Cf$, $\textsc{Shortcut} \in \P$.

Figures (11)

  • Figure 1: The performance of different methods over $100$ instances with the same parameters. On the vertical axis, the rewards relative to a reference algorithm (higher is better) or execution times (lower is better).
  • Figure 2: The performance of different methods for $d=20$ with destinations that have different quantities over 100 instances with same parameters.
  • Figure 3: The performance of the RL agent compared to other methods. The experiments are done on a single instance, with the same routes and destinations at every episode. The learning curves is the mean over 10 different instances picked randomly.
  • Figure 4: The performance of the RL agent trained on 200 different instances compared to the average performance of DP in the scenario with $d=50$. On the right-hand side, we keep 80% of destinations unchanged in all instances.
  • Figure 5: The performance of different methods using synthetic data. On the vertical axis, the rewards relative to a reference algorithm (higher is better) or execution times (lower is better).
  • ...and 6 more figures

Theorems & Definitions (3)

  • theorem thmcountertheorem
  • theorem thmcountertheorem
  • proof