Graph-Enhanced Deep Reinforcement Learning for Multi-Objective Unrelated Parallel Machine Scheduling

Bulent Soykan; Sean Mondesire; Ghaith Rabadi; Grace Bochenek

Graph-Enhanced Deep Reinforcement Learning for Multi-Objective Unrelated Parallel Machine Scheduling

Bulent Soykan, Sean Mondesire, Ghaith Rabadi, Grace Bochenek

TL;DR

A Deep Reinforcement Learning framework using Proximal Policy Optimization (PPO) and a Graph Neural Network (GNN), allowing the PPO agent to learn a direct scheduling policy and provides a robust and scalable solution for complex manufacturing scheduling.

Abstract

The Unrelated Parallel Machine Scheduling Problem (UPMSP) with release dates, setups, and eligibility constraints presents a significant multi-objective challenge. Traditional methods struggle to balance minimizing Total Weighted Tardiness (TWT) and Total Setup Time (TST). This paper proposes a Deep Reinforcement Learning framework using Proximal Policy Optimization (PPO) and a Graph Neural Network (GNN). The GNN effectively represents the complex state of jobs, machines, and setups, allowing the PPO agent to learn a direct scheduling policy. Guided by a multi-objective reward function, the agent simultaneously minimizes TWT and TST. Experimental results on benchmark instances demonstrate that our PPO-GNN agent significantly outperforms a standard dispatching rule and a metaheuristic, achieving a superior trade-off between both objectives. This provides a robust and scalable solution for complex manufacturing scheduling.

Graph-Enhanced Deep Reinforcement Learning for Multi-Objective Unrelated Parallel Machine Scheduling

TL;DR

Abstract

Paper Structure (23 sections, 2 figures, 1 table)

This paper contains 23 sections, 2 figures, 1 table.

INTRODUCTION
Related Work
Parallel Machine Scheduling Problem (PMSP)
Deep Reinforcement Learning in Scheduling
Graph Neural Networks in Combinatorial Optimization
Problem Formulation
Formal Definition
Decision Variables
Constraints
Objective Functions
Methodology: PPO-GNN for Multi-Objective UPMSP
Markov Decision Process (MDP) Formulation
State Representation ($s_t \in \mathcal{S}$)
Action Space ($a_t \in \mathcal{A}$)
Reward Function ($r_t = \mathcal{R}(s_t, a_t, s_{t+1})$)
...and 8 more sections

Figures (2)

Figure 1: Overview of the DRL framework for UPMSP scheduling.
Figure 2: Comparison of scheduling methods across different problem sizes (n jobs, m machines).

Graph-Enhanced Deep Reinforcement Learning for Multi-Objective Unrelated Parallel Machine Scheduling

TL;DR

Abstract

Graph-Enhanced Deep Reinforcement Learning for Multi-Objective Unrelated Parallel Machine Scheduling

Authors

TL;DR

Abstract

Table of Contents

Figures (2)