Table of Contents
Fetching ...

A Negotiation-Based Multi-Agent Reinforcement Learning Approach for Dynamic Scheduling of Reconfigurable Manufacturing Systems

Manonmani Sekar, Nasim Nezamoddini

TL;DR

This work addresses dynamic scheduling in reconfigurable manufacturing systems (RMS) using a negotiation-based multi-agent reinforcement learning framework. It combines centralized training with decentralized execution (CTDE) and an enhanced DQN architecture equipped with attention, dueling, prioritized replay, and multi-objective loss to coordinate machine and job agents through an auction-like negotiation mechanism. Experiments in a simulated Enhanced RMS show the approach reduces makespan and tardiness while maintaining high machine utilization, and demonstrate resilience under machine breakdowns when reconfiguration is used judiciously. The results highlight the practical potential of MARL with negotiation for adaptive, real-time RMS scheduling, and point to future work in more realistic simulations, predictive maintenance integration, and transfer learning for broader generalization.

Abstract

Reconfigurable manufacturing systems (RMS) are critical for future market adjustment given their rapid adaptation to fluctuations in consumer demands, the introduction of new technological advances, and disruptions in linked supply chain sections. The adjustable hard settings of such systems require a flexible soft planning mechanism that enables realtime production planning and scheduling amid the existing complexity and variability in their configuration settings. This study explores the application of multi agent reinforcement learning (MARL) for dynamic scheduling in soft planning of the RMS settings. In the proposed framework, deep Qnetwork (DQN) agents trained in centralized training learn optimal job machine assignments in real time while adapting to stochastic events such as machine breakdowns and reconfiguration delays. The model also incorporates a negotiation with an attention mechanism to enhance state representation and improve decision focus on critical system features. Key DQN enhancements including prioritized experience replay, nstep returns, double DQN and soft target update are used to stabilize and accelerate learning. Experiments conducted in a simulated RMS environment demonstrate that the proposed approach outperforms baseline heuristics in reducing makespan and tardiness while improving machine utilization. The reconfigurable manufacturing environment was extended to simulate realistic challenges, including machine failures and reconfiguration times. Experimental results show that while the enhanced DQN agent is effective in adapting to dynamic conditions, machine breakdowns increase variability in key performance metrics such as makespan, throughput, and total tardiness. The results confirm the advantages of applying the MARL mechanism for intelligent and adaptive scheduling in dynamic reconfigurable manufacturing environments.

A Negotiation-Based Multi-Agent Reinforcement Learning Approach for Dynamic Scheduling of Reconfigurable Manufacturing Systems

TL;DR

This work addresses dynamic scheduling in reconfigurable manufacturing systems (RMS) using a negotiation-based multi-agent reinforcement learning framework. It combines centralized training with decentralized execution (CTDE) and an enhanced DQN architecture equipped with attention, dueling, prioritized replay, and multi-objective loss to coordinate machine and job agents through an auction-like negotiation mechanism. Experiments in a simulated Enhanced RMS show the approach reduces makespan and tardiness while maintaining high machine utilization, and demonstrate resilience under machine breakdowns when reconfiguration is used judiciously. The results highlight the practical potential of MARL with negotiation for adaptive, real-time RMS scheduling, and point to future work in more realistic simulations, predictive maintenance integration, and transfer learning for broader generalization.

Abstract

Reconfigurable manufacturing systems (RMS) are critical for future market adjustment given their rapid adaptation to fluctuations in consumer demands, the introduction of new technological advances, and disruptions in linked supply chain sections. The adjustable hard settings of such systems require a flexible soft planning mechanism that enables realtime production planning and scheduling amid the existing complexity and variability in their configuration settings. This study explores the application of multi agent reinforcement learning (MARL) for dynamic scheduling in soft planning of the RMS settings. In the proposed framework, deep Qnetwork (DQN) agents trained in centralized training learn optimal job machine assignments in real time while adapting to stochastic events such as machine breakdowns and reconfiguration delays. The model also incorporates a negotiation with an attention mechanism to enhance state representation and improve decision focus on critical system features. Key DQN enhancements including prioritized experience replay, nstep returns, double DQN and soft target update are used to stabilize and accelerate learning. Experiments conducted in a simulated RMS environment demonstrate that the proposed approach outperforms baseline heuristics in reducing makespan and tardiness while improving machine utilization. The reconfigurable manufacturing environment was extended to simulate realistic challenges, including machine failures and reconfiguration times. Experimental results show that while the enhanced DQN agent is effective in adapting to dynamic conditions, machine breakdowns increase variability in key performance metrics such as makespan, throughput, and total tardiness. The results confirm the advantages of applying the MARL mechanism for intelligent and adaptive scheduling in dynamic reconfigurable manufacturing environments.

Paper Structure

This paper contains 47 sections, 40 equations, 9 figures, 7 tables, 1 algorithm.

Figures (9)

  • Figure 1: Enhanced DQN Architecture
  • Figure 2: Negotiation process flow among Job, Negotiation, Machine,and DQN Scheduler Agents under the CTDE paradigm.The Negotiation Agent centrally evaluates machine bids using an attention network, while the DQN Scheduler performs global reward-based updates.
  • Figure 3: Training performance metrics of Dueling-Attn-DQN on the Improved RMS environment. (a) Episode reward increases consistently, showing effective policy improvement. (b) Training loss rapidly decreases and converges, indicating stable Q-value estimation. (c) Exploration rate decays exponentially from 1.0 to 0.05, transitioning from exploration to exploitation.
  • Figure 4: Learning curves of different reinforcement learning agents. The proposed Dueling-Attention DQN achieved the highest convergence stability.
  • Figure 5: Performance comparison of scheduling agents across four metrics: Makespan, Total Tardiness, Average Utilization, and Average Setup Time. EnhancedDQN demonstrates superior stability and lower makespan while maintaining high utilization.
  • ...and 4 more figures