DiLi: A Lock-Free Asynchronously Distributable Linked List
Raaghav Ravishankar, Sandeep Kulkarni, Sathya Peri, Gokarna Sharma
TL;DR
DiLi tackles the challenge of scaling lock-free linked lists across multiple machines by introducing conditional lock-freedom, a framework that preserves lock-free behavior under finite communication delays. It deploys a two-layer, distributable architecture with a lazily replicated registry and sublists, enabling asynchronous Split/Move/Switch operations to balance load without blocking client operations. Empirical results show comparable throughput to state-of-the-art lock-free skip lists on a single machine and linear throughput scaling across multiple machines, with practical overheads for background operations. This work provides a concrete path to horizontally scale lock-free data structures in distributed settings, potentially extending to other foundational structures like trees and graphs.
Abstract
Modern databases use dynamic search structures that store a huge amount of data, and often serve them using multi-threaded algorithms to support the ever-increasing throughput needs. When this throughput need exceeds the capacity of the machine hosting the structure, one either needs to replace the underlying hardware (an option that is typically not viable and introduces a long down time) or make the data structure distributed. Static partitioning of the data structure for distribution is not desirable, as it is prone to uneven load distribution over time, and having to change the partitioning scheme later will require downtime. Since a distributed data structure, inherently, relies on communication support from the network stack and operating systems, we introduce the notion of conditional lock-freedom that extends the notion of lock-free computation with reasonable assumptions about communication between processes. We present DiLi, a conditional lock-free, linearizable, and distributable linked list that can be asynchronously and dynamically (1) partitioned into multiple sublists and (2) load balanced by distributing sublists across multiple machines. DiLi contains primitives for these that also maintain the lock-free property of the underlying search structure that supports find, remove, and insert of a key as the client operations. Searching for an item in DiLi is by a novel traversal that involves a binary search on the partitioning scheme, and then a linear traversal on a limitable number of linked nodes. As a result, we are able to empirically show that DiLi performs as well as the state-of-the-art lock-free concurrent search structures that are based off of a linked list when executed on a single-machine. We also show that the throughput of DiLi scales linearly with the number of machines that host it.
