Reinforcement Learning for Multi-Truck Vehicle Routing Problems
Joshua Levin, Randall Correll, Takanori Ide, Takafumi Suzuki, Saito Takaho, Alan Arai
TL;DR
This paper tackles realistic vehicle routing problems that feature multiple trucks and multi-leg routing requirements by extending encoder–decoder attention models with tensor-based demand representations and a two-phase planning workflow. It decomposes large problems into fixed-size subproblems, solves routing decisions with a multi-truck RL-aware decoder, and resolves box pickups/dropoffs with heuristics, updating a tensor demand structure iteratively. The authors introduce dynamical masking to incorporate rank-2 demand into the attention mechanism and demonstrate substantial performance gains on a real-world Aisin VRP, achieving a 138-truck solution that outperforms the prior 142-truck benchmark. The work advances scalable, industrial-scale RL for VRPs and highlights practical considerations and future enhancements for real-world deployment.
Abstract
Deep reinforcement learning (RL) has been shown to be effective in producing approximate solutions to some vehicle routing problems (VRPs), especially when using policies generated by encoder-decoder attention mechanisms. While these techniques have been quite successful for relatively simple problem instances, there are still under-researched and highly complex VRP variants for which no effective RL method has been demonstrated. In this work we focus on one such VRP variant, which contains multiple trucks and multi-leg routing requirements. In these problems, demand is required to move along sequences of nodes, instead of just from a start node to an end node. With the goal of making deep RL a viable strategy for real-world industrial-scale supply chain logistics, we develop new extensions to existing encoder-decoder attention models which allow them to handle multiple trucks and multi-leg routing requirements. Our models have the advantage that they can be trained for a small number of trucks and nodes, and then embedded into a large supply chain to yield solutions for larger numbers of trucks and nodes. We test our approach on a real supply chain environment arising in the operations of Japanese automotive parts manufacturer Aisin Corporation, and find that our algorithm outperforms Aisin's previous best solution.
