Semaphores Augmented with a Waiting Array
Dave Dice, Alex Kogan
TL;DR
This paper tackles the scalability and FCFS fairness challenges of traditional semaphores by transforming a simple Ticket-Semaphore into a scalable TWA-Semaphore that uses a Waiting Array to diffuse waiting and reduce global coherence traffic. The approach combines ticket-based locking with a fixed-size waiting table and a tunable LongTermThreshold to switch between short-term spinning and longer-term waiting, with optional waiting-chains and park-unpark support. An extensive empirical evaluation demonstrates strong performance at both low and high contention, and discusses practical integration with futex or blocking primitives. The result is a compact, low-latency semaphore design suitable for Linux kernel usage and application-level synchronization, offering a practical path toward scalable synchronization primitives.
Abstract
Semaphores are a widely used and foundational synchronization and coordination construct used for shared memory multithreaded programming. They are a keystone concept, in the sense that most other synchronization constructs can be implemented in terms of semaphores, although the converse does not generally hold. Semaphores and the quality of their implementation are of consequence as they remain heavily used in the Linux kernel and are also available for application programming via the pthreads programming interface. We first show that semaphores can be implemented by borrowing ideas from the classic ticket lock algorithm. The resulting "ticket-semaphore" algorithm is simple and compact (space efficient) but does not scale well because of the detrimental impact of global spinning. We then transform "ticket-semaphore" into the "TWA-semaphore" by the applying techniques derived from the "TWA - Ticket Locks Augmented with a Waiting Array" algorithm, yielding a scalable semaphore that remains compact and has extremely low latency.
