Exploiting Application-to-Architecture Dependencies for Designing Scalable OS
Yao Xiao, Nikos Kanakaris, Anzhe Cheng, Chenzhong Yin, Nesreen K. Ahmed, Shahin Nazarian, Andrei Irimia, Paul Bogdan
TL;DR
The paper addresses OS scalability and application-awareness gaps on multi-core platforms by introducing NetworkedOS, a four-layer cross-layer network abstraction that links dynamic application instructions, kernel interactions, memory frames, and hardware cores. It combines compile-time optimization, via an overlapping-cluster partitioning that minimizes a quality function $T$ balancing sequential work, parallel work, and IPC, with a run-time greedy mapper that assigns processes to cores based on memory affinity and inter-process interactions. The approach is instantiated by constructing a four-layer network from instruction traces, defining $T$ to guide partitioning, and executing an $O(P)$ runtime scheduling strategy to reduce IPC and messaging. Empirical evaluation on multi-core hardware shows substantial improvements over MINIX3, Linux, and Barrelfish in IPC efficiency and application performance on NAS PARSEC benchmarks, highlighting the practical potential of cross-layer OS design for scalable, affinity-aware scheduling on large-core systems. The key contributions include the formal multi-layer network model, the overlapping-cluster partitioning, and the memory-affinity–driven runtime mapping, with demonstrated gains up to several-fold in real-system experiments.
Abstract
With the advent of hundreds of cores on a chip to accelerate applications, the operating system (OS) needs to exploit the existing parallelism provided by the underlying hardware resources to determine the right amount of processes to be mapped on the multi-core systems. However, the existing OS is not scalable and is oblivious to applications. We address these issues by adopting a multi-layer network representation of the dynamic application-to OS-to-architecture dependencies, namely the NetworkedOS. We adopt a compile-time analysis and construct a network representing the dependencies between dynamic instructions translated from the applications and the kernel and services. We propose an overlapping partitioning scheme to detect the clusters or processes that can potentially run in parallel to be mapped onto cores while reducing the number of messages transferred. At run time, processes are mapped onto the multi-core systems, taking into consideration the process affinity. Our experimental results indicate that NetworkedOS achieves performance improvement as high as 7.11x compared to Linux running on a 128-core system and 2.01x to Barrelfish running on a 64-core system.
