Interference-free Operating System: A 6 Years' Experience in Mitigating Cross-Core Interference in Linux
Zhaomeng Deng, Ziqi Zhang, Ding Li, Yao Guo, Yunfeng Ye, Yuxin Ren, Ning Jia, Xinwei Hu
TL;DR
The paper investigates cross-core interference originating from the Linux kernel in multi-core real-time contexts and documents six years of industrial practice to mitigate it. It identifies fragmentation in existing isolation mechanisms and advocates a unified, partition-aware design with explicit core indicators, isolation-friendly synchronization, and verifiable programming practices. The authors report fixing 34 cross-core bugs and merging numerous patches, achieving substantial improvements in worst-case jitter and schedulability, with openEuler delivering up to 8.7x better worst-case latency performance and 11.5x schedulability gains over vanilla Linux. The work demonstrates significant end-to-end benefits for real-time systems like cFS and ROS2 and emphasizes practical guidance for developers, system designers, and researchers to systematically eliminate OS-induced interference in production environments.
Abstract
Real-time operating systems employ spatial and temporal isolation to guarantee predictability and schedulability of real-time systems on multi-core processors. Any unbounded and uncontrolled cross-core performance interference poses a significant threat to system time safety. However, the current Linux kernel has a number of interference issues and represents a primary source of interference. Unfortunately, existing research does not systematically and deeply explore the cross-core performance interference issue within the OS itself. This paper presents our industry practice for mitigating cross-core performance interference in Linux over the past 6 years. We have fixed dozens of interference issues in different Linux subsystems. Compared to the version without our improvements, our enhancements reduce the worst-case jitter by a factor of 8.7, resulting in a maximum 11.5x improvement over system schedulability. For the worst-case latency in the Core Flight System and the Robot Operating System 2, we achieve a 1.6x and 1.64x reduction over RT-Linux. Based on our development experience, we summarize the lessons we learned and offer our suggestions to system developers for systematically eliminating cross-core interference from the following aspects: task management, resource management, and concurrency management. Most of our modifications have been merged into Linux upstream and released in commercial distributions.
