Reinforcement Learning for Multi-Truck Vehicle Routing Problems

Joshua Levin; Randall Correll; Takanori Ide; Takafumi Suzuki; Saito Takaho; Alan Arai

Reinforcement Learning for Multi-Truck Vehicle Routing Problems

Joshua Levin, Randall Correll, Takanori Ide, Takafumi Suzuki, Saito Takaho, Alan Arai

TL;DR

This paper tackles realistic vehicle routing problems that feature multiple trucks and multi-leg routing requirements by extending encoder–decoder attention models with tensor-based demand representations and a two-phase planning workflow. It decomposes large problems into fixed-size subproblems, solves routing decisions with a multi-truck RL-aware decoder, and resolves box pickups/dropoffs with heuristics, updating a tensor demand structure iteratively. The authors introduce dynamical masking to incorporate rank-2 demand into the attention mechanism and demonstrate substantial performance gains on a real-world Aisin VRP, achieving a 138-truck solution that outperforms the prior 142-truck benchmark. The work advances scalable, industrial-scale RL for VRPs and highlights practical considerations and future enhancements for real-world deployment.

Abstract

Deep reinforcement learning (RL) has been shown to be effective in producing approximate solutions to some vehicle routing problems (VRPs), especially when using policies generated by encoder-decoder attention mechanisms. While these techniques have been quite successful for relatively simple problem instances, there are still under-researched and highly complex VRP variants for which no effective RL method has been demonstrated. In this work we focus on one such VRP variant, which contains multiple trucks and multi-leg routing requirements. In these problems, demand is required to move along sequences of nodes, instead of just from a start node to an end node. With the goal of making deep RL a viable strategy for real-world industrial-scale supply chain logistics, we develop new extensions to existing encoder-decoder attention models which allow them to handle multiple trucks and multi-leg routing requirements. Our models have the advantage that they can be trained for a small number of trucks and nodes, and then embedded into a large supply chain to yield solutions for larger numbers of trucks and nodes. We test our approach on a real supply chain environment arising in the operations of Japanese automotive parts manufacturer Aisin Corporation, and find that our algorithm outperforms Aisin's previous best solution.

Reinforcement Learning for Multi-Truck Vehicle Routing Problems

TL;DR

Abstract

Paper Structure (23 sections, 26 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 23 sections, 26 equations, 4 figures, 1 table, 1 algorithm.

Introduction
Vehicle Routing Problems
Basic Vehicle Routing Problem
Generalized Vehicle Routing Problem
Supply Chain Management Workflow
Tensor Demand Structure
Subenvironment Search
Phase 1: Route-finding
Phase 2: Pickup and Dropoff Decisions
Total Demand Update
Policy Neural Network for Routing Decisions
Reinforcement Learning for Vehicle Routing Problems
Encoder
Multi-head Attention Mechanism
Decoder
...and 8 more sections

Figures (4)

Figure 1: An example training curve showing the cost function, Eq. (\ref{['eq:cost']}), over 400 training epochs on subenvironments containing three trucks and five nodes.
Figure 2: Number of trucks required as a function of total initial demand. Total initial demand is given as a fraction of the total demand of the Aisin VRP. Note for the full-scale problem, our algorithm finds a solution using 138 trucks, thus outperforming Aisin's 142-truck solution. However, as seen on the left side of the plot, the algorithm is less effective with a small amount of demand spread over many nodes.
Figure 3: Truck-routing connectivity graph for the 138-truck solution obtained using the algorithm described in this paper. The underlying map shows an area of approximately 85km by 85km of Nagoya, Japan. Note that this figure only shows which edges are used in the solution, and does not show the demand flow along each edge, which direction trucks are driving, or timing details.
Figure 4: The percentage of the total initial demand volume delivered by each 3-truck team. The first teams deployed are the most efficient due to the abundance of demand available to them.

Reinforcement Learning for Multi-Truck Vehicle Routing Problems

TL;DR

Abstract

Reinforcement Learning for Multi-Truck Vehicle Routing Problems

Authors

TL;DR

Abstract

Table of Contents

Figures (4)