Table of Contents
Fetching ...

KernelOracle: Predicting the Linux Scheduler's Next Move with Deep Learning

Sampanna Yashwant Kahu

TL;DR

The paper tackles the Linux kernel scheduling problem by introducing a data-driven approach that leverages an LSTM to predict the next task scheduled by the Completely Fair Scheduler (CFS). A dataset is generated from perf sched measurements on a single-CPU VM under sustained load, with 28 distinct tasks encoded for time-series modeling. The LSTM is trained for 30 epochs, showing decreasing test loss and increasingly realistic sequence predictions, highlighting the feasibility of predictive scheduling and its potential to reduce context switches. While promising, the work also discusses practical integration challenges, particularly latency, and outlines directions for hardware-accelerated or optimized implementations to enable kernel-level deployment.

Abstract

Efficient task scheduling is paramount in the Linux kernel, where the Completely Fair Scheduler (CFS) meticulously manages CPU resources to balance high utilization with interactive responsiveness. This research pioneers the use of deep learning techniques to predict the sequence of tasks selected by CFS, aiming to evaluate the feasibility of a more generalized and potentially more adaptive task scheduler for diverse workloads. Our core contributions are twofold: first, the systematic generation and curation of a novel scheduling dataset from a running Linux kernel, capturing real-world CFS behavior; and second, the development, training, and evaluation of a Long Short-Term Memory (LSTM) network designed to accurately forecast the next task to be scheduled. This paper further discusses the practical pathways and implications of integrating such a predictive model into the kernel's scheduling framework. The findings and methodologies presented herein open avenues for data-driven advancements in kernel scheduling, with the full source code provided for reproducibility and further exploration.

KernelOracle: Predicting the Linux Scheduler's Next Move with Deep Learning

TL;DR

The paper tackles the Linux kernel scheduling problem by introducing a data-driven approach that leverages an LSTM to predict the next task scheduled by the Completely Fair Scheduler (CFS). A dataset is generated from perf sched measurements on a single-CPU VM under sustained load, with 28 distinct tasks encoded for time-series modeling. The LSTM is trained for 30 epochs, showing decreasing test loss and increasingly realistic sequence predictions, highlighting the feasibility of predictive scheduling and its potential to reduce context switches. While promising, the work also discusses practical integration challenges, particularly latency, and outlines directions for hardware-accelerated or optimized implementations to enable kernel-level deployment.

Abstract

Efficient task scheduling is paramount in the Linux kernel, where the Completely Fair Scheduler (CFS) meticulously manages CPU resources to balance high utilization with interactive responsiveness. This research pioneers the use of deep learning techniques to predict the sequence of tasks selected by CFS, aiming to evaluate the feasibility of a more generalized and potentially more adaptive task scheduler for diverse workloads. Our core contributions are twofold: first, the systematic generation and curation of a novel scheduling dataset from a running Linux kernel, capturing real-world CFS behavior; and second, the development, training, and evaluation of a Long Short-Term Memory (LSTM) network designed to accurately forecast the next task to be scheduled. This paper further discusses the practical pathways and implications of integrating such a predictive model into the kernel's scheduling framework. The findings and methodologies presented herein open avenues for data-driven advancements in kernel scheduling, with the full source code provided for reproducibility and further exploration.

Paper Structure

This paper contains 17 sections, 11 figures.

Figures (11)

  • Figure 1: High-level structure of Recurrent Neural Networks rnn_image.
  • Figure 2: Example of recording scheduler metrics using the perf command. This example test was run for 50 seconds. A total of 49.915 MB of data was collected which consisted of 422084 samples.
  • Figure 3: Visualization of the 'perf sched' command.
  • Figure 4: The number of schedules of a given process. This was captured when 'ab' was generating load on 'nginx' by sending a total of 1 million http requests.
  • Figure 5: The computational graph of our model.
  • ...and 6 more figures