SmartPQ: An Adaptive Concurrent Priority Queue for NUMA Architectures

Christina Giannoula; Foteini Strati; Dimitrios Siakavaras; Georgios Goumas; Nectarios Koziris

SmartPQ: An Adaptive Concurrent Priority Queue for NUMA Architectures

Christina Giannoula, Foteini Strati, Dimitrios Siakavaras, Georgios Goumas, Nectarios Koziris

TL;DR

SmartPQ tackles the challenge of scalable concurrent priority queues on NUMA architectures by adaptively switching between a NUMA-oblivious base and a NUMA-aware Nuddle-backed mode. Built on the generic Nuddle framework, SmartPQ uses a lightweight decision-tree classifier to predict the better mode under current workload characteristics, enabling transitions with minimal overhead. Empirical results show SmartPQ delivers the best performance across diverse contention regimes (up to $1.87$-fold over SprayList) and maintains a classifier accuracy of $87.9\%$ with modest misprediction cost. This work demonstrates that dynamic adaptation between NUMA-aware and NUMA-oblivious implementations can sustain high throughput in both insert-dominated and deleteMin-dominated workloads, and sets the stage for applying similar adaptivity to other NUMA-centric data structures.

Abstract

Concurrent priority queues are widely used in important workloads, such as graph applications and discrete event simulations. However, designing scalable concurrent priority queues for NUMA architectures is challenging. Even though several NUMA-oblivious implementations can scale up to a high number of threads, exploiting the potential parallelism of insert operation, NUMA-oblivious implementations scale poorly in deleteMin-dominated workloads. This is because all threads compete for accessing the same memory locations, i.e., the highest-priority element of the queue, thus incurring excessive cache coherence traffic and non-uniform memory accesses between nodes of a NUMA system. In such scenarios, NUMA-aware implementations are typically used to improve system performance on a NUMA system. In this work, we propose an adaptive priority queue, called SmartPQ. SmartPQ tunes itself by switching between a NUMA-oblivious and a NUMA-aware algorithmic mode to achieve high performance under all various contention scenarios. SmartPQ has two key components. First, it is built on top of NUMA Node Delegation (Nuddle), a generic low-overhead technique to construct efficient NUMA-aware data structures using any arbitrary concurrent NUMA-oblivious implementation as its backbone. Second, SmartPQ integrates a lightweight decision making mechanism to decide when to switch between NUMA-oblivious and NUMA-aware algorithmic modes. Our evaluation shows that, in NUMA systems, SmartPQ performs best in all various contention scenarios with 87.9% success rate, and dynamically adapts between NUMA-aware and NUMA-oblivious algorithmic mode, with negligible performance overheads. SmartPQ improves performance by 1.87x on average over SprayList, the state-of-theart NUMA-oblivious priority queue.

SmartPQ: An Adaptive Concurrent Priority Queue for NUMA Architectures

TL;DR

-fold over SprayList) and maintains a classifier accuracy of

with modest misprediction cost. This work demonstrates that dynamic adaptation between NUMA-aware and NUMA-oblivious implementations can sustain high throughput in both insert-dominated and deleteMin-dominated workloads, and sets the stage for applying similar adaptivity to other NUMA-centric data structures.

Abstract

Paper Structure (17 sections, 11 figures, 3 tables)

This paper contains 17 sections, 11 figures, 3 tables.

Introduction
NUMA Node Delegation (Nuddle)
Overview
Implementation Details
SmartPQ
Selecting the Algorithmic Mode
The Need for a Machine Learning Approach
Decision Tree Classifier
Implementation Details
Experimental Evaluation
Throughput of Nuddle
Throughput of SmartPQ
Classifier Accuracy
Varying the Contention Workload
Discussion & Future Work
...and 2 more sections

Figures (11)

Figure 1: Throughput achieved by a NUMA-obliviousspraylistherlihy and a NUMA-awareffwd priority queue, both initialized with 1024 keys. We use 64 threads that perform a mix of insert and deleteMin operations in parallel, and the key range is set to 2048 keys. We use all NUMA nodes of a 4-node NUMA system, the characteristics of which are presented in Section \ref{['sec:experimental']}.
Figure 2: High-level overview of SmartPQ. SmartPQ dynamically adapts its algorithm to the contention levels of the workload based on the prediction of a simple classifier.
Figure 3: High-level design of ffwdffwd and Nuddle. Nuddlelocates all server threads at the same NUMA node to design a NUMA-awarescheme, and associates each of them to multiple client thread groups. Nuddle uses the communication protocol proposed inffwdffwd.
Figure 4: Helper structures of Nuddle.
Figure 5: Initialization functions of Nuddle.
...and 6 more figures

SmartPQ: An Adaptive Concurrent Priority Queue for NUMA Architectures

TL;DR

Abstract

SmartPQ: An Adaptive Concurrent Priority Queue for NUMA Architectures

Authors

TL;DR

Abstract

Table of Contents

Figures (11)