Using Reinforcement Learning for the Three-Dimensional Loading Capacitated Vehicle Routing Problem

Stefan Schoepf; Stephen Mak; Julian Senoner; Liming Xu; Netland Torbjörn; Alexandra Brintrup

Using Reinforcement Learning for the Three-Dimensional Loading Capacitated Vehicle Routing Problem

Stefan Schoepf, Stephen Mak, Julian Senoner, Liming Xu, Netland Torbjörn, Alexandra Brintrup

TL;DR

This work addresses the 3L-CVRP by introducing an attention-based encoder–decoder framework guided by PPO and a packing heuristic to solve joint routing and 3D packing efficiently. The approach achieves near-linear compute-time scaling and competitive routing performance, with average gaps to state-of-the-art on benchmark instances in the range of roughly 0.75%–11.86% across setups. It also contributes an open-source RL environment for 3L-CVRP to spur further research and practical exploration. Overall, the study demonstrates the potential of reinforcement learning to enable large-scale, globally-aware logistics optimization for emissions reduction and efficiency gains.

Abstract

Heavy goods vehicles are vital backbones of the supply chain delivery system but also contribute significantly to carbon emissions with only 60% loading efficiency in the United Kingdom. Collaborative vehicle routing has been proposed as a solution to increase efficiency, but challenges remain to make this a possibility. One key challenge is the efficient computation of viable solutions for co-loading and routing. Current operations research methods suffer from non-linear scaling with increasing problem size and are therefore bound to limited geographic areas to compute results in time for day-to-day operations. This only allows for local optima in routing and leaves global optimisation potential untouched. We develop a reinforcement learning model to solve the three-dimensional loading capacitated vehicle routing problem in approximately linear time. While this problem has been studied extensively in operations research, no publications on solving it with reinforcement learning exist. We demonstrate the favourable scaling of our reinforcement learning model and benchmark our routing performance against state-of-the-art methods. The model performs within an average gap of 3.83% to 8.10% compared to established methods. Our model not only represents a promising first step towards large-scale logistics optimisation with reinforcement learning but also lays the foundation for this research stream. GitHub: https://github.com/if-loops/3L-CVRP

Using Reinforcement Learning for the Three-Dimensional Loading Capacitated Vehicle Routing Problem

TL;DR

Abstract

Using Reinforcement Learning for the Three-Dimensional Loading Capacitated Vehicle Routing Problem

Authors

TL;DR

Abstract

Table of Contents

Figures (8)