Improving Mixed-Criticality Scheduling with Reinforcement Learning
Muhammad El-Mahdy, Nourhan Sakr, Rodrigo Carrasco
TL;DR
The paper addresses the challenge of offline non-preemptive mixed-criticality scheduling on varying-speed processors, an NP-hard problem, by formulating it as a Markov decision process and solving it with a Masked PPO reinforcement learning agent. The proposed approach prioritizes high-criticality tasks while maintaining overall system performance, and it is validated on both synthetic data (up to 100,000 instances) and real server data, under scenarios with and without processor degradation. Key findings show HI completion around 85% and overall completion around 80% under degraded conditions, and higher performance (up to 93-94% HI/overall) in stable, no-degradation settings. The work demonstrates the scalability and effectiveness of RL for complex real-time scheduling and outlines concrete future directions, including online/preemptive variants and integration of safety constraints for safety-critical applications.
Abstract
This paper introduces a novel reinforcement learning (RL) approach to scheduling mixed-criticality (MC) systems on processors with varying speeds. Building upon the foundation laid by [1], we extend their work to address the non-preemptive scheduling problem, which is known to be NP-hard. By modeling this scheduling challenge as a Markov Decision Process (MDP), we develop an RL agent capable of generating near-optimal schedules for real-time MC systems. Our RL-based scheduler prioritizes high-critical tasks while maintaining overall system performance. Through extensive experiments, we demonstrate the scalability and effectiveness of our approach. The RL scheduler significantly improves task completion rates, achieving around 80% overall and 85% for high-criticality tasks across 100,000 instances of synthetic data and real data under varying system conditions. Moreover, under stable conditions without degradation, the scheduler achieves 94% overall task completion and 93% for high-criticality tasks. These results highlight the potential of RL-based schedulers in real-time and safety-critical applications, offering substantial improvements in handling complex and dynamic scheduling scenarios.
