Table of Contents
Fetching ...

Multi-Agent Environments for Vehicle Routing Problems

Ricardo Gama, Daniel Fuertes, Carlos R. del-Blanco, Hugo L. Fernandes

TL;DR

A library composed of multi-agent environments that simulates classic vehicle routing problems and provides a flexible modular architecture design that allows easy customization and incorporation of new routing problems, enabling rapid adoption and easy integration into existing reinforcement learning frameworks.

Abstract

Research on Reinforcement Learning (RL) approaches for discrete optimization problems has increased considerably, extending RL to an area classically dominated by Operations Research (OR). Vehicle routing problems are a good example of discrete optimization problems with high practical relevance where RL techniques have had considerable success. Despite these advances, open-source development frameworks remain scarce, hampering both the testing of algorithms and the ability to objectively compare results. This ultimately slows down progress in the field and limits the exchange of ideas between the RL and OR communities. Here we propose a library composed of multi-agent environments that simulates classic vehicle routing problems. The library, built on PyTorch, provides a flexible modular architecture design that allows easy customization and incorporation of new routing problems. It follows the Agent Environment Cycle ("AEC") games model and has an intuitive API, enabling rapid adoption and easy integration into existing reinforcement learning frameworks. The library allows for a straightforward use of classical OR benchmark instances in order to narrow the gap between the test beds for algorithm benchmarking used by the RL and OR communities. Additionally, we provide benchmark instance sets for each environment, as well as baseline RL models and training code.

Multi-Agent Environments for Vehicle Routing Problems

TL;DR

A library composed of multi-agent environments that simulates classic vehicle routing problems and provides a flexible modular architecture design that allows easy customization and incorporation of new routing problems, enabling rapid adoption and easy integration into existing reinforcement learning frameworks.

Abstract

Research on Reinforcement Learning (RL) approaches for discrete optimization problems has increased considerably, extending RL to an area classically dominated by Operations Research (OR). Vehicle routing problems are a good example of discrete optimization problems with high practical relevance where RL techniques have had considerable success. Despite these advances, open-source development frameworks remain scarce, hampering both the testing of algorithms and the ability to objectively compare results. This ultimately slows down progress in the field and limits the exchange of ideas between the RL and OR communities. Here we propose a library composed of multi-agent environments that simulates classic vehicle routing problems. The library, built on PyTorch, provides a flexible modular architecture design that allows easy customization and incorporation of new routing problems. It follows the Agent Environment Cycle ("AEC") games model and has an intuitive API, enabling rapid adoption and easy integration into existing reinforcement learning frameworks. The library allows for a straightforward use of classical OR benchmark instances in order to narrow the gap between the test beds for algorithm benchmarking used by the RL and OR communities. Additionally, we provide benchmark instance sets for each environment, as well as baseline RL models and training code.

Paper Structure

This paper contains 18 sections, 2 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 2.1: Illustration of a multi-agent VRP instance with four vehicles (left) and the corresponding timeline (right). All vehicles have completed a series of three actions steps. Since agents act asynchronously, the next agent to interact with the environment might have only partial information available.
  • Figure 3.1: Schematic representation of the library architecture.
  • Figure 4.1: Evolution of models performance during training, using single and smallest time agent selection strategies, averaged by epoch. Each problem as 100 services. For the CVRPTW the fleet has 25 vehicles, and for both TOPTW and PCVRPTW 5 vehicles.
  • Figure B.1: Illustration of MADyAM architecture. (I) Encoding block consisting of $n$ transformer encoder layers; (II) Embedding layers responsible for the projection of the observation into the embedding space; (III) Attention glimpse layer; (IV) Pointer layer outputting actions probabilities.