Table of Contents
Fetching ...

Learning to Solve Orienteering Problem with Time Windows and Variable Profits

Songqun Gao, Zanxi Ruan, Patrick Floor, Marco Roveri, Luigi Palopoli, Daniele Fontanelli

TL;DR

A learning-based two-stage DEcoupled discrete-Continuous optimization with Service-time-guided Trajectory (DeCoST) which aims to effectively decouple the discrete and continuous decision variables in the OPTWVP problem, while enabling efficient and learnable coordination between them.

Abstract

The orienteering problem with time windows and variable profits (OPTWVP) is common in many real-world applications and involves continuous time variables. Current approaches fail to develop an efficient solver for this orienteering problem variant with discrete and continuous variables. In this paper, we propose a learning-based two-stage DEcoupled discrete-Continuous optimization with Service-time-guided Trajectory (DeCoST), which aims to effectively decouple the discrete and continuous decision variables in the OPTWVP problem, while enabling efficient and learnable coordination between them. In the first stage, a parallel decoding structure is employed to predict the path and the initial service time allocation. The second stage optimizes the service times through a linear programming (LP) formulation and provides a long-horizon learning of structure estimation. We rigorously prove the global optimality of the second-stage solution. Experiments on OPTWVP instances demonstrate that DeCoST outperforms both state-of-the-art constructive solvers and the latest meta-heuristic algorithms in terms of solution quality and computational efficiency, achieving up to 6.6x inference speedup on instances with fewer than 500 nodes. Moreover, the proposed framework is compatible with various constructive solvers and consistently enhances the solution quality for OPTWVP.

Learning to Solve Orienteering Problem with Time Windows and Variable Profits

TL;DR

A learning-based two-stage DEcoupled discrete-Continuous optimization with Service-time-guided Trajectory (DeCoST) which aims to effectively decouple the discrete and continuous decision variables in the OPTWVP problem, while enabling efficient and learnable coordination between them.

Abstract

The orienteering problem with time windows and variable profits (OPTWVP) is common in many real-world applications and involves continuous time variables. Current approaches fail to develop an efficient solver for this orienteering problem variant with discrete and continuous variables. In this paper, we propose a learning-based two-stage DEcoupled discrete-Continuous optimization with Service-time-guided Trajectory (DeCoST), which aims to effectively decouple the discrete and continuous decision variables in the OPTWVP problem, while enabling efficient and learnable coordination between them. In the first stage, a parallel decoding structure is employed to predict the path and the initial service time allocation. The second stage optimizes the service times through a linear programming (LP) formulation and provides a long-horizon learning of structure estimation. We rigorously prove the global optimality of the second-stage solution. Experiments on OPTWVP instances demonstrate that DeCoST outperforms both state-of-the-art constructive solvers and the latest meta-heuristic algorithms in terms of solution quality and computational efficiency, achieving up to 6.6x inference speedup on instances with fewer than 500 nodes. Moreover, the proposed framework is compatible with various constructive solvers and consistently enhances the solution quality for OPTWVP.
Paper Structure (39 sections, 1 theorem, 28 equations, 4 figures, 10 tables, 1 algorithm)

This paper contains 39 sections, 1 theorem, 28 equations, 4 figures, 10 tables, 1 algorithm.

Key Result

Theorem 4.1

The service time optimization algorithm shown in Algorithm service_time_opt_algorithm returns an optimal solution $(s^*, d^*)$ to the service time scheduling problem specified in Equation (service_time_opt_problem).

Figures (4)

  • Figure 1: (a) Application of OPTWVP in industrial scenarios. A manipulator and a human are collaborating on a defect removal task on the assembly line. The robot has limited time to operate. It cannot access the nodes outside the time windows for safety reasons, since the human might have collisions with the robot. Meanwhile, the reward depends on the service time spent handling the defect, with the defect size decreasing linearly over time industrial_example. (b) Illustration of the challenge faced by NCO methods in allocating hybrid discrete-continuous decisions and capturing continuous-variable representations.
  • Figure 2: An overview of the DeCoST approach. The upper part illustrates the two-stage collaborative optimization process, while the lower part presents the details of the STO Algorithm \ref{['service_time_opt_algorithm']}.
  • Figure 3: Boxplot of the optimality gap under different problem sizes and time window settings.
  • Figure 4: Analysis of initial service time ratio versus optimality gap.

Theorems & Definitions (2)

  • Theorem 4.1
  • proof