Table of Contents
Fetching ...

Exploring Multi-Agent Reinforcement Learning for Unrelated Parallel Machine Scheduling

Maria Zampella, Urtzi Otamendi, Xabier Belaunzaran, Arkaitz Artetxe, Igor G. Olaizola, Giuseppe Longo, Basilio Sierra

TL;DR

This paper addresses the Unrelated Parallel Machine Scheduling Problem (UPMS) with setup times and resources using a Multi-Agent Reinforcement Learning (MARL) approach and introduces the Reinforcement Learning environment and conducts empirical analyses, comparing MARL with Single-Agent algorithms.

Abstract

Scheduling problems pose significant challenges in resource, industry, and operational management. This paper addresses the Unrelated Parallel Machine Scheduling Problem (UPMS) with setup times and resources using a Multi-Agent Reinforcement Learning (MARL) approach. The study introduces the Reinforcement Learning environment and conducts empirical analyses, comparing MARL with Single-Agent algorithms. The experiments employ various deep neural network policies for single- and Multi-Agent approaches. Results demonstrate the efficacy of the Maskable extension of the Proximal Policy Optimization (PPO) algorithm in Single-Agent scenarios and the Multi-Agent PPO algorithm in Multi-Agent setups. While Single-Agent algorithms perform adequately in reduced scenarios, Multi-Agent approaches reveal challenges in cooperative learning but a scalable capacity. This research contributes insights into applying MARL techniques to scheduling optimization, emphasizing the need for algorithmic sophistication balanced with scalability for intelligent scheduling solutions.

Exploring Multi-Agent Reinforcement Learning for Unrelated Parallel Machine Scheduling

TL;DR

This paper addresses the Unrelated Parallel Machine Scheduling Problem (UPMS) with setup times and resources using a Multi-Agent Reinforcement Learning (MARL) approach and introduces the Reinforcement Learning environment and conducts empirical analyses, comparing MARL with Single-Agent algorithms.

Abstract

Scheduling problems pose significant challenges in resource, industry, and operational management. This paper addresses the Unrelated Parallel Machine Scheduling Problem (UPMS) with setup times and resources using a Multi-Agent Reinforcement Learning (MARL) approach. The study introduces the Reinforcement Learning environment and conducts empirical analyses, comparing MARL with Single-Agent algorithms. The experiments employ various deep neural network policies for single- and Multi-Agent approaches. Results demonstrate the efficacy of the Maskable extension of the Proximal Policy Optimization (PPO) algorithm in Single-Agent scenarios and the Multi-Agent PPO algorithm in Multi-Agent setups. While Single-Agent algorithms perform adequately in reduced scenarios, Multi-Agent approaches reveal challenges in cooperative learning but a scalable capacity. This research contributes insights into applying MARL techniques to scheduling optimization, emphasizing the need for algorithmic sophistication balanced with scalability for intelligent scheduling solutions.

Paper Structure

This paper contains 21 sections, 2 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Illustrative example scheduled with the optimal solution: five jobs, two machines and two workers. The timeline plot is divided into machine-job (yellow) and worker-machine (grey) scheduling.
  • Figure 2: A simplified representation of a Reinforcement Learning System.
  • Figure 3: Training results of the four Single-Agent models during 5 million timesteps, showing the mean episode reward as a metric. The line shows the estimate of the central tendency and a confidence interval of the multiple training runs.
  • Figure 4: Training results of the Multi-agent models during 5 million timesteps, showing the mean episode reward as a metric. The line shows the estimate of the central tendency and a confidence interval of the multiple training runs.
  • Figure 5: Training results comparison of the best Single-Agent algorithm, Maskable PPO journals/corr/abs-2006-14171, and Multi-Agent algorithm, MAPPO yu2022surprising.