BlockFIFO & MultiFIFO: Scalable Relaxed Queues
Stefan Koch, Peter Sanders, Marvin Williams
TL;DR
This work introduces two scalable relaxed concurrent FIFO queues, MultiFIFO and BlockFIFO, to overcome contention in strict FIFO queues. MultiFIFO adapts the MultiQueue design by using internal ring buffers with insertion timestamps, achieving constant-time operations and rank error linear in the number of threads $p$. BlockFIFO builds a lock-free structure from blocks and push/pop windows, enabling high throughput at the cost of larger rank errors, with practical enhancements like a bitset and lookahead windows to improve performance. Extensive evaluations across micro-benchmarks and BFS on diverse architectures demonstrate order-of-magnitude throughput gains over prior relaxed and strict queues, highlighting the practical impact for parallel graph processing and other throughput-centric workloads.
Abstract
FIFO queues are a fundamental data structure used in a wide range of applications. Concurrent FIFO queues allow multiple execution threads to access the queue simultaneously. Maintaining strict FIFO semantics in concurrent queues leads to low throughput due to high contention at the head and tail of the queue. By relaxing the FIFO semantics to allow some reordering of elements, it becomes possible to achieve much higher scalability. This work presents two orthogonal designs for relaxed concurrent FIFO queues, one derived from the MultiQueue and the other based on ring buffers. We evaluate both designs extensively on various micro-benchmarks and a breadth-first search application on large graphs. Both designs outperform state-of-the-art relaxed and strict FIFO queues, achieving higher throughput and better scalability.
