SmartPQ: An Adaptive Concurrent Priority Queue for NUMA Architectures
Christina Giannoula, Foteini Strati, Dimitrios Siakavaras, Georgios Goumas, Nectarios Koziris
TL;DR
SmartPQ tackles the challenge of scalable concurrent priority queues on NUMA architectures by adaptively switching between a NUMA-oblivious base and a NUMA-aware Nuddle-backed mode. Built on the generic Nuddle framework, SmartPQ uses a lightweight decision-tree classifier to predict the better mode under current workload characteristics, enabling transitions with minimal overhead. Empirical results show SmartPQ delivers the best performance across diverse contention regimes (up to $1.87$-fold over SprayList) and maintains a classifier accuracy of $87.9\%$ with modest misprediction cost. This work demonstrates that dynamic adaptation between NUMA-aware and NUMA-oblivious implementations can sustain high throughput in both insert-dominated and deleteMin-dominated workloads, and sets the stage for applying similar adaptivity to other NUMA-centric data structures.
Abstract
Concurrent priority queues are widely used in important workloads, such as graph applications and discrete event simulations. However, designing scalable concurrent priority queues for NUMA architectures is challenging. Even though several NUMA-oblivious implementations can scale up to a high number of threads, exploiting the potential parallelism of insert operation, NUMA-oblivious implementations scale poorly in deleteMin-dominated workloads. This is because all threads compete for accessing the same memory locations, i.e., the highest-priority element of the queue, thus incurring excessive cache coherence traffic and non-uniform memory accesses between nodes of a NUMA system. In such scenarios, NUMA-aware implementations are typically used to improve system performance on a NUMA system. In this work, we propose an adaptive priority queue, called SmartPQ. SmartPQ tunes itself by switching between a NUMA-oblivious and a NUMA-aware algorithmic mode to achieve high performance under all various contention scenarios. SmartPQ has two key components. First, it is built on top of NUMA Node Delegation (Nuddle), a generic low-overhead technique to construct efficient NUMA-aware data structures using any arbitrary concurrent NUMA-oblivious implementation as its backbone. Second, SmartPQ integrates a lightweight decision making mechanism to decide when to switch between NUMA-oblivious and NUMA-aware algorithmic modes. Our evaluation shows that, in NUMA systems, SmartPQ performs best in all various contention scenarios with 87.9% success rate, and dynamically adapts between NUMA-aware and NUMA-oblivious algorithmic mode, with negligible performance overheads. SmartPQ improves performance by 1.87x on average over SprayList, the state-of-theart NUMA-oblivious priority queue.
