Table of Contents
Fetching ...

Interpretable Modeling of Deep Reinforcement Learning Driven Scheduling

Boyang Li, Zhiling Lan, Michael E. Papka

TL;DR

This work targets the interpretability barrier in DRL-driven HPC scheduling by presenting IRL, which imitates a DRL policy with an interpretable decision-tree model. Using imitation learning, DAgger iterations, and a critical-state pruning strategy, IRL extracts a compact tree that approximates the DRL outputs $Q(s,a)$ while preserving scheduling performance. Trace-based evaluation demonstrates that IRL delivers comparable results to the original DRL and substantially reduces tree size and runtime overhead, while also providing insight into reward design. The approach offers a practical pathway for deploying DRL-based schedulers in production HPC systems, enabling easier debugging, modification, and governance.

Abstract

In the field of high-performance computing (HPC), there has been recent exploration into the use of deep reinforcement learning for cluster scheduling (DRL scheduling), which has demonstrated promising outcomes. However, a significant challenge arises from the lack of interpretability in deep neural networks (DNN), rendering them as black-box models to system managers. This lack of model interpretability hinders the practical deployment of DRL scheduling. In this work, we present a framework called IRL (Interpretable Reinforcement Learning) to address the issue of interpretability of DRL scheduling. The core idea is to interpret DNN (i.e., the DRL policy) as a decision tree by utilizing imitation learning. Unlike DNN, decision tree models are non-parametric and easily comprehensible to humans. To extract an effective and efficient decision tree, IRL incorporates the Dataset Aggregation (DAgger) algorithm and introduces the notion of critical state to prune the derived decision tree. Through trace-based experiments, we demonstrate that IRL is capable of converting a black-box DNN policy into an interpretable rulebased decision tree while maintaining comparable scheduling performance. Additionally, IRL can contribute to the setting of rewards in DRL scheduling.

Interpretable Modeling of Deep Reinforcement Learning Driven Scheduling

TL;DR

This work targets the interpretability barrier in DRL-driven HPC scheduling by presenting IRL, which imitates a DRL policy with an interpretable decision-tree model. Using imitation learning, DAgger iterations, and a critical-state pruning strategy, IRL extracts a compact tree that approximates the DRL outputs while preserving scheduling performance. Trace-based evaluation demonstrates that IRL delivers comparable results to the original DRL and substantially reduces tree size and runtime overhead, while also providing insight into reward design. The approach offers a practical pathway for deploying DRL-based schedulers in production HPC systems, enabling easier debugging, modification, and governance.

Abstract

In the field of high-performance computing (HPC), there has been recent exploration into the use of deep reinforcement learning for cluster scheduling (DRL scheduling), which has demonstrated promising outcomes. However, a significant challenge arises from the lack of interpretability in deep neural networks (DNN), rendering them as black-box models to system managers. This lack of model interpretability hinders the practical deployment of DRL scheduling. In this work, we present a framework called IRL (Interpretable Reinforcement Learning) to address the issue of interpretability of DRL scheduling. The core idea is to interpret DNN (i.e., the DRL policy) as a decision tree by utilizing imitation learning. Unlike DNN, decision tree models are non-parametric and easily comprehensible to humans. To extract an effective and efficient decision tree, IRL incorporates the Dataset Aggregation (DAgger) algorithm and introduces the notion of critical state to prune the derived decision tree. Through trace-based experiments, we demonstrate that IRL is capable of converting a black-box DNN policy into an interpretable rulebased decision tree while maintaining comparable scheduling performance. Additionally, IRL can contribute to the setting of rewards in DRL scheduling.
Paper Structure (19 sections, 8 figures, 2 tables, 1 algorithm)

This paper contains 19 sections, 8 figures, 2 tables, 1 algorithm.

Figures (8)

  • Figure 1: The interaction between environment and an agent in reinforcement learning.
  • Figure 2: Overview of IRL design. A cylinder represents a data repository $\mathcal{D}$. A rounded rectangle denotes a scheduling policy, which is either DRL or a decision tree. A rectangle represents a datatset. It is sampled from $\mathcal{D}$, or produced by a scheduling policy (decision tree or DRL).
  • Figure 3: Decision tree (depth=10) generated by IRL from the DQN agent with Reward I. Only the first two depths are presented in the figure due to the space limitation. Note that the decision tree's branches primarily revolve around job wait time for decision-making.
  • Figure 4: Decision tree (depth=10) generated by IRL from the DQN agent with Reward A. Only the first two depths are presented in the figure due to the space limitation. Note that the decision tree's branches mainly revolve around requested running time for decision-making.
  • Figure 5: Comparison of scheduling performance with DQN under different reward settings.
  • ...and 3 more figures