Table of Contents
Fetching ...

Quantum Reinforcement Learning with Transformers for the Capacitated Vehicle Routing Problem

Eva Andrés

TL;DR

This work presents a quantum reinforcement learning framework for the dynamic, multi-vehicle CVRP by integrating Transformer-based architectures with an Advantage Actor-Critic algorithm. It introduces three variants—Classical Pointer Network (CPN), Hybrid Quantum Pointer Network (HQP), and Fully Quantum Pointer Network (FQP)—and evaluates them in a consistent CVRP environment incorporating overlap and zonification penalties. Results show that quantum-enhanced models can yield more robust and organized routing policies, with HQP often delivering the best overall performance, albeit at higher computational cost during training. The study highlights the potential and challenges of quantum-classical hybrids for complex combinatorial routing problems and outlines a path toward more expressive and scalable quantum routing agents.

Abstract

This paper addresses the Capacitated Vehicle Routing Problem (CVRP) by comparing classical and quantum Reinforcement Learning (RL) approaches. An Advantage Actor-Critic (A2C) agent is implemented in classical, full quantum, and hybrid variants, integrating transformer architectures to capture the relationships between vehicles, clients, and the depot through self- and cross-attention mechanisms. The experiments focus on multi-vehicle scenarios with capacity constraints, considering 20 clients and 4 vehicles, and are conducted over ten independent runs. Performance is assessed using routing distance, route compactness, and route overlap. The results show that all three approaches are capable of learning effective routing policies. However, quantum-enhanced models outperform the classical baseline and produce more robust route organization, with the hybrid architecture achieving the best overall performance across distance, compactness, and route overlap. In addition to quantitative improvements, qualitative visualizations reveal that quantum-based models generate more structured and coherent routing solutions. These findings highlight the potential of hybrid quantum-classical reinforcement learning models for addressing complex combinatorial optimization problems such as the CVRP.

Quantum Reinforcement Learning with Transformers for the Capacitated Vehicle Routing Problem

TL;DR

This work presents a quantum reinforcement learning framework for the dynamic, multi-vehicle CVRP by integrating Transformer-based architectures with an Advantage Actor-Critic algorithm. It introduces three variants—Classical Pointer Network (CPN), Hybrid Quantum Pointer Network (HQP), and Fully Quantum Pointer Network (FQP)—and evaluates them in a consistent CVRP environment incorporating overlap and zonification penalties. Results show that quantum-enhanced models can yield more robust and organized routing policies, with HQP often delivering the best overall performance, albeit at higher computational cost during training. The study highlights the potential and challenges of quantum-classical hybrids for complex combinatorial routing problems and outlines a path toward more expressive and scalable quantum routing agents.

Abstract

This paper addresses the Capacitated Vehicle Routing Problem (CVRP) by comparing classical and quantum Reinforcement Learning (RL) approaches. An Advantage Actor-Critic (A2C) agent is implemented in classical, full quantum, and hybrid variants, integrating transformer architectures to capture the relationships between vehicles, clients, and the depot through self- and cross-attention mechanisms. The experiments focus on multi-vehicle scenarios with capacity constraints, considering 20 clients and 4 vehicles, and are conducted over ten independent runs. Performance is assessed using routing distance, route compactness, and route overlap. The results show that all three approaches are capable of learning effective routing policies. However, quantum-enhanced models outperform the classical baseline and produce more robust route organization, with the hybrid architecture achieving the best overall performance across distance, compactness, and route overlap. In addition to quantitative improvements, qualitative visualizations reveal that quantum-based models generate more structured and coherent routing solutions. These findings highlight the potential of hybrid quantum-classical reinforcement learning models for addressing complex combinatorial optimization problems such as the CVRP.
Paper Structure (28 sections, 33 equations, 10 figures, 1 table)

This paper contains 28 sections, 33 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: CPN architecture
  • Figure 2: HQP architecture
  • Figure 3: FQP architecture
  • Figure 4: Architectural and training configuration of the three evaluated models.
  • Figure 5: Boxplots illustrating the distribution of average distance obtained from quantum (HPN, QPN) and from classical model (CPN).
  • ...and 5 more figures