Demystifying Reinforcement Learning in Production Scheduling via Explainable AI

Daniel Fischer; Hannah M. Hüsener; Felix Grumbach; Lukas Vollenkemper; Arthur Müller; Pascal Reusch

Demystifying Reinforcement Learning in Production Scheduling via Explainable AI

Daniel Fischer, Hannah M. Hüsener, Felix Grumbach, Lukas Vollenkemper, Arthur Müller, Pascal Reusch

TL;DR

The paper investigates how to render DRL-based production scheduling decisions explainable to domain experts. It systematically applies two xAI methods, SHAP (DeepSHAP) and Captum (Input X Gradient), within a hypotheses-based workflow that fuses domain knowledge with the agent's reward structure to generate falsifiable explanations. The study finds that while both methods can illuminate decision factors, DeepSHAP typically yields clearer, more consistent attributions, and that a structured workflow with hypotheses verification improves trust and communication with stakeholders. The proposed approach offers a practical blueprint for deploying explainable DRL in real-world scheduling and emphasizes iterative validation, robustness checks, and audience-tailored interpretations to bridge the gap between AI reasoning and domain expertise.

Abstract

Deep Reinforcement Learning (DRL) is a frequently employed technique to solve scheduling problems. Although DRL agents ace at delivering viable results in short computing times, their reasoning remains opaque. We conduct a case study where we systematically apply two explainable AI (xAI) frameworks, namely SHAP (DeepSHAP) and Captum (Input x Gradient), to describe the reasoning behind scheduling decisions of a specialized DRL agent in a flow production. We find that methods in the xAI literature lack falsifiability and consistent terminology, do not adequately consider domain-knowledge, the target audience or real-world scenarios, and typically provide simple input-output explanations rather than causal interpretations. To resolve this issue, we introduce a hypotheses-based workflow. This approach enables us to inspect whether explanations align with domain knowledge and match the reward hypotheses of the agent. We furthermore tackle the challenge of communicating these insights to third parties by tailoring hypotheses to the target audience, which can serve as interpretations of the agent's behavior after verification. Our proposed workflow emphasizes the repeated verification of explanations and may be applicable to various DRL-based scheduling use cases.

Demystifying Reinforcement Learning in Production Scheduling via Explainable AI

TL;DR

Abstract

Paper Structure (75 sections, 4 equations, 26 figures, 13 tables)

This paper contains 75 sections, 4 equations, 26 figures, 13 tables.

Introduction
Background and key concepts
Contribution and Research Questions
Literature Review
Reinforcement Learning Applications in Production and Scheduling
State of the art xAI frameworks and methods
The choice of explanation type
xAI Methods
Decision Trees
SHAP (SHapley Additive exPlanation)
DeepLIFT
DeepSHAP
Input X Gradient
Layer-wise relevance propagation
xRL
...and 60 more sections

Figures (26)

Figure 1: Model of the considered two-stage flow production system by müller2024reinforcement.
Figure 2: Framework for xAI in business by tchuente2024methodological. Idea, data and context are presented as three clusters.
Figure 3: Bar plot of the total amount of products that were produced. Product five has been produced the most. Some products have not been produced at all in the given dataset.
Figure 4: Order of produced product lots. The x-axis shows the order of the products. Product 5's production has been interrupted three times. 33 lots of product 5 have been produced uninterruptedly. This is the longest unbroken sequence.
Figure 5: The buffer fill level over 113 decisions of the agent. After index 50 there is a clear upward trend. The trend line has been fitted with a locally weighted liner regression model (frac = 0.66). Note that the y-axis starts at 0.75. Reason: The throughput of the FAS decreases from index 60, so that the PAS fills the buffer with future relevant products. The critical phase in which utilization and setup efforts are balanced are therefore up to around index 60.
...and 21 more figures

Demystifying Reinforcement Learning in Production Scheduling via Explainable AI

TL;DR

Abstract

Demystifying Reinforcement Learning in Production Scheduling via Explainable AI

Authors

TL;DR

Abstract

Table of Contents

Figures (26)