Table of Contents
Fetching ...

An End-to-End Learning Approach for Solving Capacitated Location-Routing Problems

Changhao Miao, Yuntian Zhang, Tongyu Wu, Fang Deng, Chen Chen

TL;DR

This work addresses capacitated location-routing problems (CLRPs) and open CLRPs (OCLRP) by introducing DRLHQ, an end-to-end DRL framework built on an encoder–decoder that reformulates CLRPs as a Markov decision process and employs a heterogeneous querying attention with a dynamic masking policy. The method integrates location and routing decisions within a single MDP, leveraging POMO-based training and a GRU-enhanced location query to capture interdependencies, while instance augmentation and simulation-based beam search boost inference. Experimental results on synthetic and Prins benchmark datasets show DRLHQ achieving superior solution quality and generalization compared with exact solvers, classical heuristics, and prior DRL approaches, with ablations confirming the value of dynamic masking and heterogeneous queries. The approach offers a practical, scalable, end-to-end solution for CLRPs, with potential impact in supply-chain, emergency planning, and disaster relief where joint facility and routing decisions are critical. Mathematical formulations, such as the CLRPs objective $\min \; \sum_{i \in I} O_i y_i + \sum_{i \in V} \sum_{j \\in V} \sum_{k \\in K} c_{ij} x_{ijk} + \sum_{i \in I} \sum_{j \\in J} \sum_{k \\in K} F x_{ijk}$, and the MDP components, are embedded within the learning framework to guide policy optimization and feasible solution construction.

Abstract

The capacitated location-routing problems (CLRPs) are classical problems in combinatorial optimization, which require simultaneously making location and routing decisions. In CLRPs, the complex constraints and the intricate relationships between various decisions make the problem challenging to solve. With the emergence of deep reinforcement learning (DRL), it has been extensively applied to address the vehicle routing problem and its variants, while the research related to CLRPs still needs to be explored. In this paper, we propose the DRL with heterogeneous query (DRLHQ) to solve CLRP and open CLRP (OCLRP), respectively. We are the first to propose an end-to-end learning approach for CLRPs, following the encoder-decoder structure. In particular, we reformulate the CLRPs as a markov decision process tailored to various decisions, a general modeling framework that can be adapted to other DRL-based methods. To better handle the interdependency across location and routing decisions, we also introduce a novel heterogeneous querying attention mechanism designed to adapt dynamically to various decision-making stages. Experimental results on both synthetic and benchmark datasets demonstrate superior solution quality and better generalization performance of our proposed approach over representative traditional and DRL-based baselines in solving both CLRP and OCLRP.

An End-to-End Learning Approach for Solving Capacitated Location-Routing Problems

TL;DR

This work addresses capacitated location-routing problems (CLRPs) and open CLRPs (OCLRP) by introducing DRLHQ, an end-to-end DRL framework built on an encoder–decoder that reformulates CLRPs as a Markov decision process and employs a heterogeneous querying attention with a dynamic masking policy. The method integrates location and routing decisions within a single MDP, leveraging POMO-based training and a GRU-enhanced location query to capture interdependencies, while instance augmentation and simulation-based beam search boost inference. Experimental results on synthetic and Prins benchmark datasets show DRLHQ achieving superior solution quality and generalization compared with exact solvers, classical heuristics, and prior DRL approaches, with ablations confirming the value of dynamic masking and heterogeneous queries. The approach offers a practical, scalable, end-to-end solution for CLRPs, with potential impact in supply-chain, emergency planning, and disaster relief where joint facility and routing decisions are critical. Mathematical formulations, such as the CLRPs objective , and the MDP components, are embedded within the learning framework to guide policy optimization and feasible solution construction.

Abstract

The capacitated location-routing problems (CLRPs) are classical problems in combinatorial optimization, which require simultaneously making location and routing decisions. In CLRPs, the complex constraints and the intricate relationships between various decisions make the problem challenging to solve. With the emergence of deep reinforcement learning (DRL), it has been extensively applied to address the vehicle routing problem and its variants, while the research related to CLRPs still needs to be explored. In this paper, we propose the DRL with heterogeneous query (DRLHQ) to solve CLRP and open CLRP (OCLRP), respectively. We are the first to propose an end-to-end learning approach for CLRPs, following the encoder-decoder structure. In particular, we reformulate the CLRPs as a markov decision process tailored to various decisions, a general modeling framework that can be adapted to other DRL-based methods. To better handle the interdependency across location and routing decisions, we also introduce a novel heterogeneous querying attention mechanism designed to adapt dynamically to various decision-making stages. Experimental results on both synthetic and benchmark datasets demonstrate superior solution quality and better generalization performance of our proposed approach over representative traditional and DRL-based baselines in solving both CLRP and OCLRP.

Paper Structure

This paper contains 30 sections, 17 equations, 4 figures, 7 tables, 2 algorithms.

Figures (4)

  • Figure 1: An illustrative example of CLRP. The decision process of CLRP can be divided into three partitions: (a) Facility Location, (b) Customer Allocation, and (c) Vehicle Routing. The decisions across different partitions are highly interdependent and strongly coupled. Each depot and vehicle is subject to capacity constraints, which makes it challenging to solve the CLRP.
  • Figure 2: The overall pipeline of DRLHQ. We propose a heterogeneous querying attention mechanism that invokes distinct query vectors tailored to various decision stages: (a) After completing a subtour, we construct a location query to determine the starting depot for the next subtour; (b) During the traversal of a subtour, we construct a routing query to select the next node to visit within the current subtour. Specifically, we introduce a GRU module into the construction of the location query to capture the decision dependencies among depots.
  • Figure 3: Illustration of using our heterogeneous querying attention mechanism to determine an instance involving two depots and five customers. The upper portion visually represents how the heterogeneous query is computed. The lower portion presents how the solution is constructed based on the node selected. At each step $t$, the mechanism computes the attention score from the representation of embeddings $h_i$, thereby deciding the node to select. Specifically, the query includes location query $q_L$ and routing query $q_R$, adaptively selecting the appropriate query based on the current decision-making stage.
  • Figure 4: The generalization results in larger scales for CLRP and OCLRP, respectively. Our DRLHQ consistently outperforms all baselines across all problem sizes.