Table of Contents
Fetching ...

Improving Mixed-Criticality Scheduling with Reinforcement Learning

Muhammad El-Mahdy, Nourhan Sakr, Rodrigo Carrasco

TL;DR

The paper addresses the challenge of offline non-preemptive mixed-criticality scheduling on varying-speed processors, an NP-hard problem, by formulating it as a Markov decision process and solving it with a Masked PPO reinforcement learning agent. The proposed approach prioritizes high-criticality tasks while maintaining overall system performance, and it is validated on both synthetic data (up to 100,000 instances) and real server data, under scenarios with and without processor degradation. Key findings show HI completion around 85% and overall completion around 80% under degraded conditions, and higher performance (up to 93-94% HI/overall) in stable, no-degradation settings. The work demonstrates the scalability and effectiveness of RL for complex real-time scheduling and outlines concrete future directions, including online/preemptive variants and integration of safety constraints for safety-critical applications.

Abstract

This paper introduces a novel reinforcement learning (RL) approach to scheduling mixed-criticality (MC) systems on processors with varying speeds. Building upon the foundation laid by [1], we extend their work to address the non-preemptive scheduling problem, which is known to be NP-hard. By modeling this scheduling challenge as a Markov Decision Process (MDP), we develop an RL agent capable of generating near-optimal schedules for real-time MC systems. Our RL-based scheduler prioritizes high-critical tasks while maintaining overall system performance. Through extensive experiments, we demonstrate the scalability and effectiveness of our approach. The RL scheduler significantly improves task completion rates, achieving around 80% overall and 85% for high-criticality tasks across 100,000 instances of synthetic data and real data under varying system conditions. Moreover, under stable conditions without degradation, the scheduler achieves 94% overall task completion and 93% for high-criticality tasks. These results highlight the potential of RL-based schedulers in real-time and safety-critical applications, offering substantial improvements in handling complex and dynamic scheduling scenarios.

Improving Mixed-Criticality Scheduling with Reinforcement Learning

TL;DR

The paper addresses the challenge of offline non-preemptive mixed-criticality scheduling on varying-speed processors, an NP-hard problem, by formulating it as a Markov decision process and solving it with a Masked PPO reinforcement learning agent. The proposed approach prioritizes high-criticality tasks while maintaining overall system performance, and it is validated on both synthetic data (up to 100,000 instances) and real server data, under scenarios with and without processor degradation. Key findings show HI completion around 85% and overall completion around 80% under degraded conditions, and higher performance (up to 93-94% HI/overall) in stable, no-degradation settings. The work demonstrates the scalability and effectiveness of RL for complex real-time scheduling and outlines concrete future directions, including online/preemptive variants and integration of safety constraints for safety-critical applications.

Abstract

This paper introduces a novel reinforcement learning (RL) approach to scheduling mixed-criticality (MC) systems on processors with varying speeds. Building upon the foundation laid by [1], we extend their work to address the non-preemptive scheduling problem, which is known to be NP-hard. By modeling this scheduling challenge as a Markov Decision Process (MDP), we develop an RL agent capable of generating near-optimal schedules for real-time MC systems. Our RL-based scheduler prioritizes high-critical tasks while maintaining overall system performance. Through extensive experiments, we demonstrate the scalability and effectiveness of our approach. The RL scheduler significantly improves task completion rates, achieving around 80% overall and 85% for high-criticality tasks across 100,000 instances of synthetic data and real data under varying system conditions. Moreover, under stable conditions without degradation, the scheduler achieves 94% overall task completion and 93% for high-criticality tasks. These results highlight the potential of RL-based schedulers in real-time and safety-critical applications, offering substantial improvements in handling complex and dynamic scheduling scenarios.

Paper Structure

This paper contains 31 sections, 9 figures, 11 tables.

Figures (9)

  • Figure 1: The training curve of the agent using masked PPO policy. The results show how the agent was learning with more time steps based on the rewards given to it that are described in the rewards function.
  • Figure 2: the schedule of an instance from Group 1
  • Figure 3: the schedule of an instance from Group 2
  • Figure 4: The average completion rate while varying the LO job percentage with no degradation.
  • Figure 5: The average missed number of jobs while varying the LO job percentage with no degradation.
  • ...and 4 more figures