Table of Contents
Fetching ...

All You Need is an Improving Column: Enhancing Column Generation for Parallel Machine Scheduling via Transformers

Amira Hijazi, Osman Ozaltin, Reha Uzsoy

TL;DR

This work presents a neural network-enhanced column generation (CG) approach for a parallel machine scheduling problem that generalizes not only to unseen, larger problem instances from the same probability distribution but also to instances from different probability distributions than those presented at training time.

Abstract

We present a neural network-enhanced column generation (CG) approach for a parallel machine scheduling problem. The proposed approach utilizes an encoder-decoder attention model, namely the transformer and pointer architectures, to develop job sequences with negative reduced cost and thus generate columns to add to the master problem. By training the neural network offline and using it in inference mode to predict negative reduced costs columns, we achieve significant computational time savings compared to dynamic programming (DP). Since the exact DP procedure is used to verify that no further columns with negative reduced cost can be identified at termination, the optimality guarantee of the original CG procedure is preserved. For small to medium-sized instances, our approach achieves an average 45% reduction in computation time compared to solving the subproblems with DP. Furthermore, the model generalizes not only to unseen, larger problem instances from the same probability distribution but also to instances from different probability distributions than those presented at training time. For large-sized instances, the proposed approach achieves an 80% improvement in the objective value in under 500 seconds, demonstrating both its scalability and efficiency.

All You Need is an Improving Column: Enhancing Column Generation for Parallel Machine Scheduling via Transformers

TL;DR

This work presents a neural network-enhanced column generation (CG) approach for a parallel machine scheduling problem that generalizes not only to unseen, larger problem instances from the same probability distribution but also to instances from different probability distributions than those presented at training time.

Abstract

We present a neural network-enhanced column generation (CG) approach for a parallel machine scheduling problem. The proposed approach utilizes an encoder-decoder attention model, namely the transformer and pointer architectures, to develop job sequences with negative reduced cost and thus generate columns to add to the master problem. By training the neural network offline and using it in inference mode to predict negative reduced costs columns, we achieve significant computational time savings compared to dynamic programming (DP). Since the exact DP procedure is used to verify that no further columns with negative reduced cost can be identified at termination, the optimality guarantee of the original CG procedure is preserved. For small to medium-sized instances, our approach achieves an average 45% reduction in computation time compared to solving the subproblems with DP. Furthermore, the model generalizes not only to unseen, larger problem instances from the same probability distribution but also to instances from different probability distributions than those presented at training time. For large-sized instances, the proposed approach achieves an 80% improvement in the objective value in under 500 seconds, demonstrating both its scalability and efficiency.

Paper Structure

This paper contains 21 sections, 9 equations, 8 figures, 3 tables, 1 algorithm.

Figures (8)

  • Figure 1: CG-DP vs CG-NN-DP
  • Figure 2: Transformer-Pointer Network with input $X = \{x_0, x_1, x_2, x_3, x_4\}$, and output $\{\Rightarrow, 3, 1, 2, 0, \Leftarrow\}$ from which we have the output schedule $[3,1,2]$. The elements $\Rightarrow$ and $\Leftarrow$ represent the beginning and end of the schedule, respectively and $0$ the machine elements. Note that job $4$ represented by $x_4$ is not selected in the partial schedule.
  • Figure 3: Transformer-based Encoder and Decoder. Figure adapted from vaswani2017attention where $N$ is the number of layers.
  • Figure 4: Autoregressive Inference of the Decoder. In the first step, the decoder takes as input the $\Rightarrow$ and the encoder output matrix $Z$, producing $o_0$. The pointer layer then uses $o_0$ and $Z$ to select $x_3$, as shown by the orange arrow. In the subsequent steps, the previously generated tokens are passed to the decoder and excluded from further selection in the pointer attention layer.
  • Figure 5: Training and Validation Loss and Accuracy
  • ...and 3 more figures