A Lock-Free Work-Stealing Algorithm for Bulk Operations

Raja Sai Nandhan Yadav Kataru; Danial Davarnia; Ali Jannesari

A Lock-Free Work-Stealing Algorithm for Bulk Operations

Raja Sai Nandhan Yadav Kataru, Danial Davarnia, Ali Jannesari

TL;DR

This paper presents a new lock-free work-stealing queue tailored for a master-worker framework used in the parallelization of a mixed-integer programming optimization solver based on decision diagrams, and argues that solver workloads with irregular node processing times would further amplify the advantages of this algorithm.

Abstract

Work-stealing is a widely used technique for balancing irregular parallel workloads, and most modern runtime systems adopt lock-free work-stealing deques to reduce contention and improve scalability. However, existing algorithms are designed for general-purpose parallel runtimes and often incur overheads that are unnecessary in specialized settings. In this paper, we present a new lock-free work-stealing queue tailored for a master-worker framework used in the parallelization of a mixed-integer programming optimization solver based on decision diagrams. Our design supports native bulk operations, grows without bounds, and assumes at most one owner and one concurrent stealer, thereby eliminating the need for heavy synchronization. We provide an informal sketch that our queue is linearizable and lock-free under this restricted concurrency model. Benchmarks demonstrate that our implementation achieves constant-latency push performance, remaining stable even as batch size increases, in contrast to existing queues from C++ Taskflow whose latencies grow sharply with batch size. Pop operations perform comparably across all implementations, while our steal operation maintains nearly flat latency across different steal proportions. We also explore an optimized steal variant that reduces latency by up to 3x in practice. Finally, a pseudo workload based on large-graph exploration confirms that all implementations scale linearly. However, we argue that solver workloads with irregular node processing times would further amplify the advantages of our algorithm.

A Lock-Free Work-Stealing Algorithm for Bulk Operations

TL;DR

Abstract

Paper Structure (12 sections, 1 equation, 10 figures, 1 table)

This paper contains 12 sections, 1 equation, 10 figures, 1 table.

Introduction
Decision Diagrams Overview
Contributions
Motivation
Requirements
Master-Worker model
Algorithm
API operations
Analysis
Evaluation
Related Work
Limitations and Conclusion

Figures (10)

Figure 1: Classic work-stealing deque model: the owner thread pushes and pops tasks at the top, while idle threads steal tasks from the bottom. A head pointer tracks the top of the deque.
Figure 2: Exact DD for Knapsack problem \ref{['knapsack_eq']}. The dashed arcs and the solid arcs correspond to the decisions of 0 and 1, respectively. The longest path is 15 (highlighted in red), obtained by traversing the path highlighted in red from the root r to the terminal t.
Figure 3: a Restricted DD for Knapsack problem \ref{['knapsack_eq']}. The dashed arcs and the solid arcs correspond to the decisions of 0 and 1, respectively, for the decision variable. The longest path is 13 (highlighted in red), obtained by traversing the path highlighted in red from the root r to the terminal t.
Figure 4: a Relaxed DD for Knapsack problem \ref{['knapsack_eq']}. The dashed arcs and the solid arcs correspond to the decisions of 0 and 1, respectively, for the decision variable. The longest path is 19 (highlighted in red), obtained by traversing the path highlighted in red from the root $r$ to the terminal $t$.
Figure 5: Bulk stealing: the stealer removes a proportion $p$ (dashed region) from the tail (right) in a single operation. The queue is severed at the cut point (dashed line), after which the suffix becomes the stolen sublist. Example shown with $p=50\%$.
...and 5 more figures

A Lock-Free Work-Stealing Algorithm for Bulk Operations

TL;DR

Abstract

A Lock-Free Work-Stealing Algorithm for Bulk Operations

Authors

TL;DR

Abstract

Table of Contents

Figures (10)