No Cords Attached: Coordination-Free Concurrent Lock-Free Queues
Yusuf Motiwala
TL;DR
This work tackles the complexity of concurrent lock-free queues by replacing indefinite memory protection with a bounded, coordination-free memory reclamation mechanism. The proposed Cyclic Memory Protection (CMP) combines state-based and cycle-based protections to achieve strict FIFO, unbounded capacity, and lock-free progress while guaranteeing bounded reclamation and resilience to stalled or failed threads. It modifies the Michael & Scott enqueue/dequeue pathways to remove unnecessary coordination and introduces a scan_cursor and cycle-management framework to maintain safety without announcements. Empirical results on production-like workloads show substantial throughput gains and robust latency behavior under high contention, highlighting CMP's practical impact for AI training and inference pipelines with hundreds of threads per node. Overall, CMP demonstrates that careful, bounded temporal protection can restore simplicity and performance in highly concurrent queues without sacrificing semantics.
Abstract
The queue is conceptually one of the simplest data structures-a basic FIFO container. However, ensuring correctness in the presence of concurrency makes existing lock-free implementations significantly more complex than their original form. Coordination mechanisms introduced to prevent hazards such as ABA, use-after-free, and unsafe reclamation often dominate the design, overshadowing the queue itself. Many schemes compromise strict FIFO ordering, unbounded capacity, or lock-free progress to mask coordination overheads. Yet the true source of complexity lies in the pursuit of infinite protection against reclamation hazards--theoretically sound but impractical and costly. This pursuit not only drives unnecessary complexity but also creates a protection paradox where excessive protection reduces system resilience rather than improving it. While such costs may be tolerable in conventional workloads, the AI era has shifted the paradigm: training and inference pipelines involve hundreds to thousands of concurrent threads per node, and at this scale, protection and coordination overheads dominate, often far heavier than the basic queue operations themselves. This paper introduces Cyclic Memory Protection (CMP), a coordination-free queue that preserves strict FIFO semantics, unbounded capacity, and lock-free progress while restoring simplicity. CMP reclaims the strict FIFO that other approaches sacrificed through bounded protection windows that provide practical reclamation guarantees. We prove strict FIFO and safety via linearizability and bounded reclamation analysis, and show experimentally that CMP outperforms state-of-the-art lock-free queues by up to 1.72-4x under high contention while maintaining scalability to hundreds of threads. Our work demonstrates that highly concurrent queues can return to their fundamental simplicity without weakening queue semantics.
