Interpretable Modeling of Deep Reinforcement Learning Driven Scheduling
Boyang Li, Zhiling Lan, Michael E. Papka
TL;DR
This work targets the interpretability barrier in DRL-driven HPC scheduling by presenting IRL, which imitates a DRL policy with an interpretable decision-tree model. Using imitation learning, DAgger iterations, and a critical-state pruning strategy, IRL extracts a compact tree that approximates the DRL outputs $Q(s,a)$ while preserving scheduling performance. Trace-based evaluation demonstrates that IRL delivers comparable results to the original DRL and substantially reduces tree size and runtime overhead, while also providing insight into reward design. The approach offers a practical pathway for deploying DRL-based schedulers in production HPC systems, enabling easier debugging, modification, and governance.
Abstract
In the field of high-performance computing (HPC), there has been recent exploration into the use of deep reinforcement learning for cluster scheduling (DRL scheduling), which has demonstrated promising outcomes. However, a significant challenge arises from the lack of interpretability in deep neural networks (DNN), rendering them as black-box models to system managers. This lack of model interpretability hinders the practical deployment of DRL scheduling. In this work, we present a framework called IRL (Interpretable Reinforcement Learning) to address the issue of interpretability of DRL scheduling. The core idea is to interpret DNN (i.e., the DRL policy) as a decision tree by utilizing imitation learning. Unlike DNN, decision tree models are non-parametric and easily comprehensible to humans. To extract an effective and efficient decision tree, IRL incorporates the Dataset Aggregation (DAgger) algorithm and introduces the notion of critical state to prune the derived decision tree. Through trace-based experiments, we demonstrate that IRL is capable of converting a black-box DNN policy into an interpretable rulebased decision tree while maintaining comparable scheduling performance. Additionally, IRL can contribute to the setting of rewards in DRL scheduling.
